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(Department of Medical Statistics, University of Birmingham) 


SAMPLING FROM A DISCRETE UNIVERSE 


I. Sampling without replacement from a finite universe 


By LANCELOT HOGBEN and 
KENNETH W. CROSS 


1. Introduction. 


The founding fathers of the theory of probability concerned 
themselves largely with the class of problems which we now call 
sampling with or without replacement from a finite universe. 
Since de Moivre and Laplace developed the normal curve as a limit 
of the binomial, theoretical statistics has been largely occupied with 
the consequences of sampling from a hypothetical normal universe, 
i.e. a universe of which the number of score classes is ex hypothesi 
infinite, and hence of a universe in which the replacement condition 
is ordinarily irrelevant since the extraction of a finite sample there- 
from does not materially change its composition. Though no con- 
tinuous distribution is consistent in the last resort with a particular 
view of matter, the use of such is not uncommonly satisfactory in 
practice as a descriptive device; but we should not lightly dismiss 
the obligation to explore the empirical credentials of an arbitrary 
postulate because it happens to be mathematically convenient to 
invoke it. There arise in practice many situations w.r.t. which it is 
not merely improper to assume a normally constituted universe as 
sufficiently descriptive of the reservoir from which we sample, but 
equally improper to assume that the extraction of a sample therefrom 
does not materially change its composition. Such situations may arise 
in statistical inspection (Quality Control) and in the selection of 
statistical documents (e.g. hospital records). It has special importance 
in that branch of statistical genetics designated by Dahlberg as the 
theory of isolates. 

The neglect of the theory of sampling without replacement is the 
more remarkable because we may regard sampling with replacement 
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as the limiting case of non-replacement sampling when the universe 
becomes infinite. If we speak of a finite universe, we commonly imply 
a universe made up of a finite number (N) of items to each of which 
we can attach a score value; but if N is sufficiently large in comparison 
with the size (r) of a sample taken therefrom, no sensible error 
results from the assumption that the composition of the residual 
universe is the same as the universe before extraction. Within such 
an assemblage of discrete score values we may assign items which 
bear the same score to a particular class, the possible number (n) 
of score classes being equal to, as is true of the rectangular universe, 
or less than N. Whether a normal, or other continuous, distribution 
will prove to be a satisfactory approximate description of a universe 
so conceived depends primarily upon whether the number of score 
classes is large; but an indefinitely large value of N is consistent 
with a very small value of n. 

For the binomial universe of 2 score classes n = 2; and we are 
free to conceive it in terms of an enumerable infinity of items or of a 
finite number N> 2. So conceived, the numbers of items (n, and n,) 
respectively allocated to one or other class may each be an enumerable 
infinity, though the ratios p and q defined by the relations Np = n, 
and Nq = n, are themselves finite. This is the assumption on which 
we proceed to derive the sampling distribution referable to success and 
failure of a treatment in a clinical trial. In contradistinction to the 
Finite Universe of which the distribution of score classes, henceforth 
called the unit-sample distribution, is necessarily discrete, a universe 
may thus be bothdiscrete and infinite. If so, the replacement condition 
is trivial; but we shall expect to obtain a good descriptive curve 
for the distribution of the r-fold sample only if r is of sufficient size 
to accommodate many consecutive score classes. 

The operations exactly descriptive of sampling from a universe 
w.r.t. which n alone is finite or of a universe w.r.t. which both n and N 
are finite equally belong to the domain of the finite calculus; but an 
exact solution may be impossible and will be very laborious unless r 
is small, It is therefore of practical importance to define in what 
circumstances a normal or other continuous curve can give a suffici- 
ently precise description of the sampling process. Thus the discrete uni- 
verse invites enquiry of two sorts: (a) how the absolute size of the 
sample affects the facies of the sampling distribution when the universe 
itself is both infinite and discrete; (b) how the sampling fraction 
determines the character of the former when the latter is discrete and 
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finite. As regards (a), we shall later see that the infinite binominal 
universe merits more attention than it has hitherto received, and 
universes of 3 or more classes have hitherto attracted little attention. 
Concerning (b), it suffices to say that the hypergeometric distribution 
of non-replacement sampling from a 2-class universe circumscribes 
most of what we know already, and is even so incomplete. 

With the end in view stated above, it will be convenient to 
employ a distinction drawn by Hogben and Waterhouse [1949] be- 
tween two types of scoring they denote respectively as taxonomic 
and representative. These terms are more explicit for the present 
purpose than the customary dichotomy between sampling of attributes 
and sampling of measurements or between qualitative and quantitative 
statistics, since the definition of an attribute may be quantitative 
and a quantitative statistic is not necessarily metrical. An attribute, 
i.e. the criterion of a class, may be qualitative or quantitative in the 
sense that we may distinguish individuals as: (a) yellow or green, 
Protestant or Catholic; (b) having less than or more than 4 million 
blood corpuscles per mm*, or weighing less than 5 % lb. at birth. 
Both specifications included under (b) are quantitative in the 
ordinary sense of the term but only one of them is metrical. In sta- 
tistical problems one classifies samples: (a) by enumeration of indi- 
viduals with a common attribute which we may either define in quali- 
tative or quantitative (enumerative or metrical) terms; (b) by some 
representative figure (e.g. sum, mean or median) which takes account 
of a numerical score attached to each individual item of the uni- 
verse. It is the first that we speak of below as taxonomical scoring, 
the second as representative. 

In the particular case of a binomial universe (n = 2), we may 
score the unit sample as 0 or 1 and the taxonomic then becomes a 
special case of the representative, but only in virtue of the fact that 
every individual member of the universe being classifiable as a 
member of class A or not-A has a unit score of 0 or 1, In the same way, 
of course, the difference between a rectangular and a binomial 
universe breaks down when q= % =p, if successive terms of 
(q+ p)! define the unit sampling distribution. When n > 2 the 
distinction between taxonomic scoring and representative scoring 
as defined above is fundamental. In the domain of binary taxonomic 
scoring the raw scores definitive of an r-fold sample run from 0,12 .$2 
by unit steps. The origin of the distribution is not necessarily zero 
if the method of scoring is representative; but this will not affect 
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the evaluation of the mean moments. The scale of the distribution of 
the score-sum is not necessarily unity; but this will not affect the 
evaluation of the f coefficients. By the same token, the {-coeffi- 
cients of the mean score will be the same as those of the score-sum 
in the representative domain; and the same applies to the pro- 
portionate and raw scores in the binary taxonomic domain as a 
limiting case. Contrariwise, the algebraic properties of the distribution 
of the difference between the mean scores (or proportionate scores) 
of a-fold and b-fold samples from the same universe are not identical 
with those of the corresponding score-sums (or raw-scores) for 
reasons more fully discussed in a later communication of this series. 
Accordingly, it will be necessary to consider separately the difference 
distribution w.r.t. the score-sum (or raw score) and the mean (or 
proportionate) score. 

In pursuing our objective, as stated above, we shall rely mainly 
upon the method of curve fitting by moments developed by Karl 
Pearson; but it is important to emphasise that the adequacy of a 
curve fitted to a discrete distribution by the Pearsonian method is 
an empirical issue which calls for arithmetical investigation to 
justify its credentials in a given situation. It is also pertinent to 
recall at the outset that all moments of finite order referable to a 
discrete universe are themselves finite. 

In the Pearsonian system of model sampling distribution moments 
higher than those of the fourth order and hence Beta coefficients of 
order higher than #5, do not occur as parameters. The justification 
of this scarcely calls for comment when the number of score classes 
in the universe is such that the contour of its histogram closely 
approaches that of a continuous curve. In any treatment of the finite 
universe, however, this postulate may be inappropriate as the 
following example will suffice to show. We suppose that the range of 
empirical observations extends only over scale divisions -1, 0, +1 
with relative frequencies 1, 4, 1. The values of 6, and #, are then 
respectively 0 and 3 as for the unit sampling distribution for the 
normal universe. As is likewise true of a normal universe, the distribu- 
tion of the mean score retains the same values (of §, and £,) for 
samples of 2, 3 or more; and all £-coefficients of odd order (Bs, Bs) 
are zero for samples of any size. Needless to say, however, the values 
of higher Pearson’ coefficients of even order (fy, Bes etc.) for the 


* We extend, in the manner of Kendall [1947], the notion of these parameters 
of a distribution beyond Pearson’s original conception. 
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unit sampling distribution of such a 3-class universe depart widely 
from their normal values. When interpreting the consequences of 
sampling from a finite universe, it is therefore profitable to give 
special attention to moments of higher order than the second. We 
here employ Kendall’s definitions, viz. if m, is the k** mean moment 
of a distribution 


baa Sure se (1.01) 
My», b 
—— aac (1.02) 


When we speak of the fitting curve for a discrete sampling 
distribution as satisfactory, our criterion of goodness of fit will 
be that it assigns to some specified range of score values a frequency 
which differs from its true value by an error numerically less than a 
predetermined figure. For instance, we may require that the frequency 
assigned to a range of scores from, say —~ to x with actual frequency 
0.95 will lie between 0.945 and 0.955, the proportionate error being 
of the order of 0.5%. A paramount consideration dictating the 
particular distribution postulated with more or less plausibility 
as an approximate description of our universe is then the possibility 
of deducing a distribution of extracted samples in a form suitable 
for tabulation. When our concern, as in this communication, is 
with sampling without replacement our universe is finite and we 
may be able to specify its structure (i.e. its unit-sample distribution) 
exactly. A first approach to the problem of fitting to a discrete 
distribution a curve which is satisfactory in the sense defined above 
then involves a specification of the moments of the r-fold (or other) 
sample distribution in terms of the moments of the u-s-d. Such 
expressions developed below refer to scoring in the representative 
domain, and include the result of sampling from a finite binomial 
universe as a special case. 

When we speak of the number (n) of score classes of the u-s-d 
in this context, we imply that every such class contains at least one 
item. If the increment of score is fixed, the number of score classes 
with at least one item in the replacement sampling distribution is 
then r(n-l) + 1. The number of score classes of an r-fold non- 
replacement sample distribution when N is finite will be less than 
this, if the number of items in either the lowest or highest score 
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valued class of the u-s-d is less than r. It therefore goes without 
saying that the number of score classes of the finite sample from the 
discrete universe, i.e. universe itself consisting of a finite number of 
score classes, is always finite; and that no infinite series can exactly 


describe sampling from such a universe. 


2. Symbolism employed. 


To take stock of sampling without replacement, it is necessary 
to label: (a) every item in the total universe as an individual entity 
regardless of the possibility that the numerical value of the score 
we attach to it is identical with that of any other item of the same 
score class; (b) every item of one sample as an entity distinct from 
any item present in any other sample from the same universe. We 
can do this by using a right hand subscript (u) to label a score (x,) 
as that of an item chosen at the u™ draw, in which case we label the 
scores of the residual items in the (n—a)-fold universe after extraction 
of an a-fold sample as x,,), X,,.----- X,_1> X,- The meaning of the 
subscripts (a+1), (a+ 2) etc. is neither more nor less arbitrary 
than the meaning we attach to the third card taken in a simultaneous 
5-fold draw. Regardless of its denomination, we then visualise each 
item of the sample and of the residual universe as one of an arbitrary 
linear sequence invoked for purposes of identification of the card. 
If we adopt this convention for the unit score, it is convenient to 
designate the score sum of the first a-fold sample as ,x , e.g. for the 
score sum of the 4-fold sample, we shall write xx = x,+x,+x3+x,. 
If we now take a second (b-fold) sample after removing a items, we 
may label the score-sum as ,x,, e.g. we write for the 3-fold sample 
extracted after previously taking a 4-fold sample x, = x;+x,+x,. 
For the score difference between the initial (a-fold) and subsequent 
(b-fold) sample score sums we shall write (a—b)X, = aX, —1X, allowing 
for the possibility that we may need to substract the score sums of 
samples each taken subsequently to another, e.g. (b—2)Xa=bXa— Xbiee 

To express moments economically, it will be convenient to 
adopt a fixed convention for sums of powers and powers of sums, 
viz: (a) for the sum of the k"” powers of the a-fold sample pds Bo 
that ,x? = x} + x}-+x3+x3; (b) for the kth power of the a-fold 
sample score sum (,x )*, so that (,x )? = (x, +x,.+x,+x,)*. For the 
on sample the brackets are redundant, i.e. xk = (x,)*. For the 
k™ moment about zero of the total universe of scores, henceforth 
designated zero moments of the unit sample distribution, we shall 
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write “4, = E(x}), for the distribution of the score sum of the first 
a-fold sample ,w, = E(,x,)* and for that of the distribution of the 
score sum of a subsequent b-fold sample , (4. = E(,x,)*. For mean 


moments (i.e. moments about the mean) we shall employ m,, ,m,, 
b.aM, in the same way. 


The following relations summarise the foregoing definitions: 


bs u=a+b 
ig = oS x : bx* = sy x* (2.01) 
== u=a+l 
u=a+b 
ert, (Sk, ba ex, (2.02) 
peat | 
u=a u=a+b 
ye = ox = ate ate) gai (2.03) 
u=1 u=a+l 
u=a+b u=a+b+e 
oa = Yom 8 om (2.04) 
u=a+l u=a+b+l 
Mm = E(x) = ay ee (2.05) 
k 1 N a u N Nx. 
nx, = Ni (2.06) 
ty = E(x.) (2.07) 
(atben = E(u) (2.08) 
(a—b)4_ = E((e_».)* (2.09) 
In the above the operation E(...) signifies, as customarily, 


taking the mean of all possible values of the argument. In what 
follows, we may sidestep unnecessary labour, when it is necessary 
to investigate the relevance of choice-order by a partial notation. 
E, ,(...-) signifies its mean value for a fixed value of x,, and E, ,(...) 
for a fixed value of x,. In the same way, E, ,,(...) signifies the 
mean value of a function of x,, x,, x, for fixed values of both x, 


and x,. 
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In this notation 
E, -E,,a( os -) a -) | Sl Sha := -) 
Hg Ker Eas | s -) = E( em -) = ie Eee etc. (2.10) 


| 
pe 


III 


The order of the appropriate operations is immaterial. In this 
symbolism E,(f) = E(f) if f is a function of u alone.’ 


3. Moments of the r-fold sample from the infinite discrete distribution. 


Though our main concern in this communication is sampling 
without replacement from a finite universe, it will clarify our task 
if we first indicate a method by which it is possible to specify the 
Pearson coefficients of the score sum or mean score of a distribution 
which is discrete but not finite or of a finite universe on the assumption 
that the replacement condition holds good. For any discrete universe 
moments of finite order are finite and the k‘** moment of the distribu- 
tion of the (a+1)-fold sample score-sum is in the notation of § 2: 


ati = Etec = E(,x, si Rails 


w=k 
api, = » Kw) E(.x.)” ad a (3.01) 


w=0 


If the replacement condition holds good, the value of the (a+1)™ 
unit sample (x,,;) does not depend on the score sum (,x) of the 
antecedent a-fold sample, and its mean value is that of the first 
unit sample, i.e. 


E(Qx) "Gay" = EQ)". Ey)” = E(x)”. E(x,)* 


Whence we derive: 
d w=k 
at ily = oS kw) . NLS . Hx—w (3.02) 
w=0 


1 This notation is consistent with the customary convention for partial 
correlation and equally appropriate to partial differentiation, e.g. Dx y(z) for the 
partial differential of z = f(x, y) with respect to x and D,.x for the partial differential 
of z with respect to y; similarly, Dx. yz for the partial differential of u = f (x, y, z) 
with respect to x and so forth. 


In 3.01 and elsewhere we follow Aitken, i.e. kw) = k! —- w! (k—w)! 
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When there is replacement we may write the score of the unit 
sample from the u-s.d. mean as X, = (x, — 43) and that of the score 
sum of the a-fold sample from its mean value, which is E (x) =a. 23, 
as ,.X = (,x.— a. y,), whence EP, Ca, oS CF, 

For the k* mean moment of the (a+-1)-fold sample score-sum 
distribution we then have: 


ap 1m, = E(,X. +- othe 
ayi1m, = kw) Stites cy dee (3.03) 


For the 2-fold sample (a=1) the foregoing expressions (3.02) and 
(3.03) involve only moments of the unit sample distribution, i.e. 
My, and yw, ,, or m, and m,_,. By iteration we then see that we 
can expand the moments of the a-fold sample score sum in terms of a 
series involving moments of the unit sample with coefficients generated 
by the algorithms for summation of figurate numbers. We then find 
that: 


saan (3.04) 
Petal 2s (3.05) 
m,=am,+ 3am} (3.06) 
wm, = a.m, + 10a®m,.m, (3.07) 
m, = am, + 15a7m,m, + 10am? + 15am} (3.08) 


m, = am, + 21a?m,.m, + 35a%m,.m; + 105am;.m} (3.09) 
m, = am, + 28a%m,.m, + 56a7m..m, + 35am} 
4+ 210a°m,.m? + 280a%m?.m, + 105a%m} (3.10) 


If we use the symbol , f, for the Pearson coefficient of order r referable 
to the a-fold mean score or score sum distribution and f, for the 
corresponding Pearson coefficient of the unit sample distribution, 


we thus derive!: 


1 The expressions for the moments in 3.04 to 3.10 of the discrete distribution 
tally with those given by Irwin on the more restrictive assumption of a continuous 
distribution of which all the moments are finite. But the derivation by this or the 
alternative method cited below is more general and we may regard the corresponding 
expressions for moments of the mean from a continuous u-s-d as a limiting case. 
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B= 8 (3-11) 
Bp = 3k (pe) (3.12) 

i Lk (3.13) 
Py = 5] 6s + 10a — )6,| 


a= 15 + = [(Bs-15) + 15(a-1) (82-3) + 10-1) A] 3-14) 
LBs = | Bot 2Ma—1) By + 35(2-1) Ba- Br + 105(a-1) Ay] (3.15) 
.B = 105 + —[(B,—105) + 28 (a-1) (8,15) +56 (a~1) A, + 


35 (a1) (fa~3)* + 210(a-1)8(B,-3) + 280(a-1),| (3.16) 


Regardless of the character of the unit-sample distribution, 
the foregoing expressions for , ,, , f, ete. all approach the correspond- 
ing f-coefficients of the normal distribution, viz: 0, 3, 0, 15, 0, 105, 
when the size (a) of the sample is large. Thus the normal curve is 
likely to give a good fit to the mean-score or score-sum distribution 
of large samples extracted with replacement from any finite universe 
or extracted without replacement from any discrete universe if also 
infinite in the sense defined in § 1. We shall later explore how large a 
must be when n is small. 

Expressions similar to (3.04)—(3.10) are obtainable for the zero 
moments but will contain 2 more terms in virtue of the fact that 
m,= 0 for any distribution. The iterative method employed to 
derive (3.04) — (3.16) is more economical for the derivation of higher 
moments of a score-sum or mean score replacement distribution or— 
what amounts to the same thing—the distribution of the s-s or m-s 
from a discrete universe which is also infinite. For the derivation 
of lower moments it is not less laborious than an alternative method 
which is applicable to the more general case of nonreplacement 
sampling. In this context, it will suffice to outline the alternative 
procedure within the foregoing framework of assumptions, viz. that 
order of choice of the constituent unit-samples of the a-fold sample 
is immaterial. Whether this is so or otherwise, in our notation 


Hogben and Cross, Sampling from a Discrete Universe 315 


ally, = E(x + Xg+ Xg 66. 0 Ky Hox) {OR G4 


Expansion of the multinomial on the right leads to a series in which 
there will be a certain number (C,) of terms of the form x*, a certain 


number of terms C,_, , of the form x*—!.x, etc. and we may express 
this as follows: 


(1 + Xp ---- x) = C.xy + Cyy-xt x, + Ceyiierig 
oy ES same ae a etc. 


If the replacement condition holds good—as it must when the 
universe is infinite 


E(xhxtxd ...) = E(x)).E(v).E() ... = pyptetta . (3-18) 
Hence (3.17) becomes: 


kk = C,. Uy + Cy Ma ae Cy» 9+ My a+ fle 
eee oe ie og Sees: etc. (3.19) 


In this expression the coefficients C,, C,_,, etc. are determinable 
from elementary considerations w.r.t. number-partitions; and in this 
context it will suffice to cite the values of C,, , 4 relevant to the 
multinomials of order not higher than 4 since our concern is with 
the evaluation of ,f, and ,/4, viz: 


Qo =a C;; = 4a®) 
Cy, = al) C,» sada 3.90 
C,, = 3a) G41 = 6a) 28) 
Qai = al) GQuiaa = al 


By substitution of (3.20) in (3.19) we can obtain by recourse to 
the usual formulae for m, in terms of zero moments the results 


cited by (3.04) — (3.10) above, e.g. 
alla = Ala + a uy (3.21) 
alls = apts + 3a ug. + ayy (3.22) 
ally = aly, + 4a u3. uy + 3a u2 + 6a uy.u7 + aut (3.23) 


The foregoing derivation depends on the assumption implicit 
in 3.18, viz. that the extraction of any constituent unit sample of 
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the a-fold sample does not affect the expected value of any subsequent 
one. Before we can adapt the method to the more general case of 
sampling without replacement, it will be necessary RE) to 
remove this restriction, i.e. to find a meaning for E(x’. x?.x¢,..) 
when the universe is finite and no replacement occurs. First, it is 
appropriate to recall that our concern in this context has been 
with the score-sum. Hence the signs of x,, x)... x, in (3.17) are all 
positive. When we come to consider the distribution of the score 
difference of 2 samples from the same universe, we shall also need 
to remove this restriction. 

If the multinominal (x, +x, .... X,,,)“ contains (a + b) 
terms all of which are positive, the expressions for C, ), C,_,; ete. 
are obtainable for k = 2, 3, 4 from (3.20) by substituting (a+b) 
for a. If a are positive and b are negative, we may write the corres- 
ponding coefficients as: 


Hy9 = (a+b) Hy. = (a+b) 

H,, = (a—b)@ H,, = 4(a—b)"! 

H;, = (a—b) H = 3(at+b)® (3.24) 
H,, = 3a7—3b®? H,,, = 6(a%—ba®—ab® + b®) 


Hyia = (a—b)") Hiuiaa = (a—b)"") 


In the above we have employed an economical symbolism for 
factorial power series involving alternate positive and negative 
terms. The Vandemonde expansion of (a+b) in its customary form 
presupposes that a and b are both positive integers; but we can 
interpret it correctly if we write: 


k=r 


k—0 
This would be strictly analogous to the binomial expansion (a-b)' if 
it were true that (—b)“ = (—1),b™. For brevity, we shall use 
square brackets as above for a summation in factorial powers analog- 


ous to the expansion of (a—b)' on the assumption that a and b are 
both integers, viz.: 


k=r 


(a—b)l = YY (—1)® rq, a® be (3.25) 


k=0 
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4. Score-sum and mean score non-replacement distribution. 

To employ the method of (3.17) et seq. to the analysis of sampling 
from the finite N-fold universe without replacement we have to 
find a meaning for E(x‘,.x°.x* ...) with due regard to the fact that 
the value of x, in a particular sample places a restriction on the 
possible value of x, in the same one. Let us first consider the ex- 
pression E (x), which we may consider in this context as a function 
of u alone, so that E(x*) = E,(x‘). This is the mean value of the 
k* power of the unit sample at the u draw, i.e. after extraction 
of (u-1) unit samples, and hence from a residual universe containing 
(N—u-+1) items, whence 


E, (Na +y%.) 


) 3 3 


(4.01) 


If v = (u+1), we must interpret the operation E,, to signify the 
mean value of the unit score from an (N-u)-fold residual universe, i.e. 


k k 
Be (ho een 


(N—u) 
In the notation of § 2: 


E(x%) a E,,.E, (xy) 
Hence from (4.01): 


Bot) = ob ay) Euler ae) 
eg E,((y—u4*-) 
N—u+l 
E(x) = E(xt) = E(x ,,) 
Fears (4.02) 


Thus the mean value of the k” power of the unit sample extracted 
at any draw is the kt zero moment of the unit sample distribution. 
To interpret product terms involving powers of different unit samples, 
we proceed in the same way; but shall place no restriction on 
either v or u other than that: (a) they are unequal; (b) each lies 
in the range 1 to r when r is the sample size. For a product of order 


(a, b) we may write 
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E(x?.x?) us Ee [x2-B,.()| 


Here E,,(...) signifies extracting the mean value of the score 
power from a universe which does not contain x,, being therefore an 
(N-1)-fold universe. Since choice order does not affect its value, we 
may therefore write in virtue of (2.05) — (2.06): 


E(xi.x) = Ny Mb E.G) Naat Roa) 


N N 
we “> E,(x,) — NO) Femi 


Whence from 4.02 


J N N 
E(xi-Xy) = Hab = Ne “a Mb an NO Math (4.03) 


In the same way, we may write: 


Bah xt.x2) = B, {x2 B, (2) [Ey n(2s)]! 


c c c 
P Ne ae ae 
<= etc 


N—2 


Similarly, we write as follows to evaluate E(x3.x?.x°.x°): 


d d 
nx, — Xy 


N—3 


-d d 
—x\,— x, 


Ly ele oy = 


Whence we obtain: 
N? 
a b c 
E(xieapexy) == Peho = N@) Ha: My- He 
N2 
a N®) {base Mandar Heat 


2N 
+ N@) Haibic (4.04) 
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4 
s/he d N 
yxy. x5. x") Sat Mahso.d) = N® Maly lefla — 
N3 
NO | ates + MaMally +e Malablase + Haliollesa + Malloy a+ 
2 


\ | 
Maltlta.a f T x@ | Meratlarn 1 Moy atare T Maratrse + 2taby rea 


ON 
+ 2UpMarerat 2Mollarb+a + 2Hatarnae} lial N® Hatbto+a (4.05) 


From (3.17) and (3.20) we may express the first 4 zero moments 
of the r-fold sample in the form: 
rity = Tefly 
rie = Tug trOuy 
rts = Tig t3rO uy y+ rus 11 
bg = TH t 4 ug. + 3119.9 + Ore 14 pr ts 1.1 


(4.06) 


From (4.03 — (4.05) we have: 


1 
M11 = Ve [Nui a Nus| 
J 2 
Hat a [N HH — Nus| 
1 Sa 
2.2 oa =a 5 13 — Nus| 
} 2 
Psii — We [N My h3- Nus| 
va | NB! : 2N 
iy 6 NO [N fy — 3N® yyy + 2Nuz 
1 
Maar = NO [Nata — 2N? wy 13 — NP m2 + 2Nus| 
1 


1 Oe ime wm [Nut — ONP ou; + 3N? uz + 8N2 nu 15 —6N | 


By substitution in (4.06) we therefore obtain the first 4 zero moments 
of the r-fold score-sum distribution in terms of those of the unit- 


sample distribution, viz.: 
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r Nr® , (4.08) 
{py Neo (Nx) Har Ny 2 
(3) 
r(N—r) (N—2r) 3N (N—1)r Nex ' 
rg = (N—1)® Ms ~(N—1)9 Myfle + (N—1)® (@ M1 (4.09) 
r(N—r) 
ee (N—1) | (N—2n) (N—3r)—N(r—1)| Ma 
AN(N—r) (N—2r + 1)r®) 6N2r®) (N—r) 


(N—1)® Hills xi (N—1)® — Mfe2 


Nex) 38N.r) (N—1) (N—r—]) 
+ qe 4t (N—1)® 


ui (4.10) 


If we now convert these into the corresponding mean moments 
in the usual way, we obtain: 
r(N—r) 


My, =m, (4.11) 


r(N—r) (N—2r) 


ae r(N—r) [(N—2r) (N—3r)—N(r—1)] ae pas aoe (4.13) 
(N— 1) (N—2) (N—3) (N—1)®) 


Hence we may express the first and second Pearson coefficients, 
,f, and ,f, of the r-fold sample distribution in terms of the corre- 
sponding coefficients of the unit sampling distribution as follows: 


Wap} 


we (Ns) (N=2)?7* eh 
it (N — 1) [(N—2r) (N— 3r) —N(r— 1] 5 
ee r(N —r) (N—2)® : 
(2) —r— r— 
3N° (N 1) (r—1) as} 


r(N—r) (N—2)® 


If N is so large that we can neglect 4N? we may simplify the above 
by the substitution of the sampling fraction F = rN”, viz.: 
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_ (125) 

:Py = (Ey By (4.16) 
ar eer B) 

+ Bs a r(1— F) 2 (4.17) 


The hypergeometric distribution for sampling in the binary 
taxonomic domain is, of course, a particular case of the distribution 
whose first two -coefficients are as defined by (4.14)-(4.15). For 
a universe with score classes 0 and 1, the respective frequencies 
being q and p the k** moment of the u—s—d is m, = q(—p)'+ pq’, 
whence the first two {-coefficients of the unit sample distribution 
are: 

N—1) (N—2r)? —p)? 

_ (N—])( (q—p) (4.18) 
r(N—r) (N—2)? — p®q? 

te (N—1) 
(N—2) (N—3) r(N—r)pq 


Px 


Bs [NON +1) —6r(N—») + 3pq 


{N#(r—2)— Nr? + 6r(N—1)} | (4.19) 


The relations defined by (4.14)-(4.15) are equally appropriate 
to the score sum and the mean score distribution and (4.18)—(4.19) 
to the raw score and the proportionate score. The k* mean moment 
of the mean (or proportionate) score of the r-fold sample distribution 
is, of course, obtainable from the foregoing formulae for those of the 
score-sum (or raw score) by applying the appropriate scalar factor 
(r_*). Comparison of (4.14)-(4.15) with (3.11)-(3.12) brings into 
focus fundamental differences between sampling with and without 
replacement from any finite universe. In either case the r-fold sample 
score distribution must be symmetrical if the unit sample distribution 
is also. In either case, the kurtosis is approximately normal (,$, ~ 3.0) 
if r is large and not more than 4/."* (vide infra) as large as N. Here 
the resemblance ends. If we sample without replacement , 6, and , B, 
both become infinite when r = N, as we should expect since there 
is then only one sample score-sum with non-zero frequency. 

Regardless of the structure of a skew finite universe sampling 
with replacement implies that the distribution becomes progressively 
less skew as the size of the sample increases; but ,f, vanishes if the 
sampling fraction is 0.5 when we sample without replacement. 
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This of itself, though a necessary, is not a sufficient condition of 
symmetry but the symmetry of the distribution when F = \, follows 
from elementary principles, if we bear in mind the fact that each 
combination of a out of N letters corresponds to the same number, 
i.e. N“ of permutations. Hence the frequencies of score-sums will 
be in the same ratio as the numbers of combinations of items whose 
total score is the same. If a = 1%4N, there will be a unique combination 
with score sum S,y—s, for each unique combination with score 
sum s,. Thus scores of s, and Sy—s, will occur with equal frequency 
when a = %4N. The mean score sum is then 4% Sy and the deviation 
of s, therefrom is (s,— %,Sx) = +S,. That of each corresponding 
combination whose score sum is (Sy—s,) will be (Sy—s,—% Sy) = 
(%4Sy—s,) = —S,. Thus score deviation of +S, and —S, must occur 
with equal frequency. 

From (4.15) we see that , 8, must be less than 3.0, when F = 44. 
at which level we may therefore expect Type II to give the best 
description of the sampling distribution if any continuous curve of the 
Pearson system is suitable. An examination of the approximate 
formula (4.17) brings into focus what is perhaps a more remarkable 
feature of nonreplacement sampling distributions than the symmetry 
of that of the half-universe sample. The expression 1—6F(1—F) 
vanishes when F = % + '/V 12, i.e. F ~ 0.22 or 0.79 between which 
limits the coefficient of 6, in the second term of (4.17) is negative 
with a numerical maximum for F ~ 0.59. Regardless of the structure 
of the universe, the sampling distribution will always be platykurtic 
unless the sampling fraction in round numbers is greater than four- 
fifths; ceteris paribus, below this level a highly leptokurtic unit-sample 
distribution will generate a more platykurtic sampling distribution 
than a distribution which is initially flatter than the normal curve, 
e.g. a rectangular or indeed even a U-shaped one. At F ~ 0.59 the 
kurtosis is a minimum and (4.17) is approximately 


7 341.1 Bs 


r 


This seemingly paradoxical characteristic of any non-replacement 
distribution comes into focus (fig. 1) if we calculate from the exact 
formula (4.15) the kurtosis for samples of different sizes extracted 
without replacement from a symmetrical 24-fold universe of only 
3 score classes. For simplicity, we may assign to the 3-classes scores 
of —l, 0 and +1, and frequencies (p,, Pp» P.) a8 below with B, 
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values for the unit-sample distribution in the range 1-12 including 
a rectangular and a U-shaped contour at the lower limit of kurtosis. 


BEE Wiles Gowrie, Ex och 0 4h oath 
1 22 1 12.0 5 14 5 2.4 
2 20 2 6.0 6 2 6 2.0 
3 18 o 4.0 8 8 8 1.5 
4 16 4 3.0 ll 2 ll 1.1 

Bre: % L 
24-fold Symmetrical Universes of 3 classes. 
Ae" Ae 2 z-33 
er a 10 2 1 Ay ie ah 
Bz \5 Be\7 2-0 
- = 4 - | 
Noe! os ea 7137 18 ¢ 
S32 2:4 3,= 30 /3= 40 
1h -| 0 +I =| 0 +I 
ae ee aaines rT) 
A260 A= |2-0 fi=_ 


xX +l 
Nos. 2 202 | 22 
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The picture disclosed by fig. 1 raises the question: what lower limit 
may ,f» attain if f, is as high as may be? This admits no simple 
answer because the size of the universe itself sets a limit both to the 
maximum value of f, and to the value of r consistent with the 
condition that the second term in (4.17) is both negative and 
numerically maximal, as when F ~ 0.6. Thus a universe of 100 items 
assignable to 3 equally spaced score classes (—1, 0, +1) as above 
cannot have a kurtosis greater than 50, and the sample size consistent 
with a minimum value of , f, is about 60. 

For the rectangular universe £6, = 0 and f, ~ 1.8 whenn = N 
is large. The possibility of generating a rectangular sampling distribu- 
tion when F = 0.6 therefore implies that 3+1.1 6, = 1.2r. If N = 
200 and F = 0.6, r = 120, so that f, could satisfy this relation only 
if B. ~ 130. For the 200-fold binomial universe defined by (0.995 +- 
0.005)1, B, exceeds 130 but no other binomial 200-fold universe and 
no 200-fold universe of more than 2 non-zero classes can satisfy 
the condition f, > 130. From a 2-class universe of 1 zero score 
value and 199 unit score values the value of ,6, for the 120-fold 
sample would be 1.12; but the sample itself would contain only 
2 score classes (viz. score sums of 120 and 119) as we see by expand- 
ing (199 + 1), Though ,f, is in this case less than 1.8; the 
distribution of the sample score is therefore monotonic. 

From (4.14)—(4.15) we see that the first two Pearson coefficients 
of the r-fold and the (N—r)-fold sample are respectively identical. 
Thus ,f, = $6, and ,f, = B, when r = (N—1). If 8, lies in the 
neighbourhood of 3(N—1)~(N+1) the kurtosis of the r-fold 
distribution does not appreciably change within the range r = 1 
to r= N—1; e.g. when N = 24 and #, = 2.76 (see fig. 2). 

For reasons which will be apparent later, in this context we may 
speak of a universe as a large one if N > 100 and n> 7. Of sucha 
universe we may then say that , 8, necessarily has the same value as 
the normal distribution when F = 1% and ,f, is nearly 3 when 
F ~ 0.2. It is therefore of interest to exhibit the magnitudes of 
+b, and ,#, when the sampling fraction lies midway between these 
two limits viz. F = 7/5). We then derive from 4.16 and 4.17: 

9 73 
Py ~ Ger Bt + sBa = 3— Fan Bs 


If N = 100, r = 35 for this value of F, in which case the numerical 
values of the above are 0.0048, and 3—0.016 £,. 
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It is clear from this that the Pearson coefficients of a one-third 
non-replacement sample from a sizeable universe will differ very 
little from their normal values. However the suitability of any 
continuous curve as a descriptive function of a discrete sampling 
distribution will be limited by the number of r-fold sample score 
classes and these may diminish with the size of sample if there is no 
replacement. We shall refer to this restriction more explicitly at a 
later stage. Meanwhile it is of interest to examine the implications 
of the expressions derived above when the sampling fraction is 


13 
SAMPLING WITHOUT REPLACEMENT FROM A 
\ 3-CLASS UNIVERSE OF 24 ITEMS .o) 
me 
é $ 
9 
ss # 2 
a 3, fa Nae 
< ‘ a Om 
3 O--g.. Ber re a Cee 6 pete 3D 22-2 QD 2 Ortyi XT -G.- 
9-9 828-928 ie eee eee aa e ener €:2-0.6 
ot tet) Eo. Q 
i a 9 5 o- .. 
re) 


14 
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' i : f 2 J uh 
i aerty: cl as ae 12 
Fig. 2. Variation of Kurtosis Coefficient (,f2) with size of sampling fraction (F). 
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8 
much smaller e.g. F = 1/,. We then find that ,f; ~ ce f, and 


Spree 345 . By. In this case for N = 100, r~ 17 and then ,f; ~ 
r 


0.03. B, and ,f, ~ 3+0.01 f. For values of f, in the range 2 to 5 
and of 6, in the range 2/3 to 5/3 the values of , 8, and, f, correspond 
to the Poisson values in the range M = 50 to 20. Actually the normal 
distribution tallies closely with the Poisson at the 2 o level when 
M > 10. 


5. Non-replacement score difference distributions. 


When we turn to the distribution of the difference between the 
a-fold and the b-fold score sum, the distinction mentioned at the 
end of §4 becomes important; and it will be the topic of further 
consideration in the second communication of this series. If our 
sample score is the score-sum (i.e. the raw score in the binary 
taxonomic domain) we may write the difference in accordance with 
the notation of §2 as 


sb ek he 
Sk a ee le eg es Seg a8 

Hence for the k™ zero moment of the difference distribution we have: 

akin Bt“ Xe ea Se he X45)" (5.01) 


If our sample score is the mean-score (i.e. proportionate score in the 
binary taxonomic domain), the difference (d) is: 


deca 


=F. (b.x; + b.x,...+ b.x,—a.x,,, — 8X. pe + os — AX) 
If we denote the k"* zero moment of the mean score difference by a: 


ay = 55, E(bx,+bx,... + bx,—ax,,,—ax,,.... —ax, ,,)* (5.02) 
Each of the foregoing expressions (5.01) and (5.02) invites examina- 
tion of three cases when the universe is both finite and discrete: 


(a) sampling without replacement from one and the same 
universe; 
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(b) extraction without replacement of the a-fold and b-fold 
samples from identical different universes; 

(c) sampling with replacement either from one and the same 
universe or from 2 identical universes. 

The last case is formally the same as when we sample with 
or without replacement from a discrete infinite universe. The 
distinction between extracting the two samples from 2 identical 
universes or from one and the same universe being then irrelevant, 
since the two samples are ex hypothesi independent in either case. 
We shall defer more detailed consideration of (c) to subsequent 
communications concerned with sampling from the discrete universe, 
since it is easy to derive by the iterative method of § 3 expressions 
for the first six Pearson coefficients in the same form as (3.11)— 
(3.16). Here it suffices to say that (c) implies: 


E(x3.x.. x apg ) = Fa.b.c.. == Ha. My. Me.. (5.03) 


The expansion of (5.01) then proceeds as for the derivation of 
(3.21)—(3.23) from (3.17) if we substitute the H-coefficients of 
(3.24) for the C-coefficients of (3.20), e.g. 


(a—by He = Hy of, + Hy at 
Thus we derive at one step: 
(a—b)41 = (a—b) uy 
(a—b) 42 = (a+b) y+ (a—b) Mui 
(a_b)3 = (Aa—D) wg + 3(a —b) wp. ry + (a—b)M yy (5.04) 
(a—b) 44 = (a+ b)ugt+ 4(a—b)!I ws. Hy + 3(a+ b)® us A 
6(a®) — ba® — ab® + bM) wp. wi + (a—b) 3 


For the development of (5.02) on the same assumption, i.e. 
sampling with replacement in accordance with (5.03), we need 
expressions analogous to the H-coefficients of (3.20) when each unit 
sample score is a product of which one factor is either b or —a. 
We then write (5.02) in the form: 


1 2 
ala = “ape [Po.oite a6 Pi.1| etc. 


For the derivation of the first 4 zero moments of the proportion- 
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ate score difference on the replacement assumption or the assumption 
that N is indefinitely large, it will suffice to cite: 


Pe, ) scab (2ee 
Diy heen ah an} 

P,) = ab (b?— a?) 

P,, = —3ab (b?— a?) 

Pia. = 2ab (b?— ad) = 
Py = ab (a? + b’) 

P3, = — 4ab (a? + bi) 

Py. = 3ab {ab (a? + b?) — (a? + b*) + 2a2b?h 

Poia = 6ab {2 (a3 + b%) — ab (b? + a2) — 2a*b?) 

Py.1.a.1 = 3ab {ab (a? + b?) + 2a%b?— 2 (a3 +b?) | 


If we sample without replacement from one and the same universe, 
we may derive the moments of the difference distribution of the 
score-sum and of the mean score in the same way except insofar as 
we interpret the co-moments 11 1, /g,; etc. as in § 4. We thus obtain 
for the score-sum (or raw score) difference distribution: 


(Hz = [{(a + b)N—(a-b)*\Nyg + N%(a—b)"Iu2] + N°) (5.06) 
(ob) Hs = (ab) [{N?— 3N (a + b) + 2(a—b)\Nusg 

+ 3N?{N (a+ b—1)—(a—b)? + (a+b) ou, 

+ N%(a-b)!) ,3] + N® (5.07) 


_ {4 (a—b)P + 3 (a + b)) 
N—1 


(a—b) M4 : (a +) 


12 (a®) —ba®) — ab @) 4 b®) 6 (a— byl 


(N=1)Gh 7% (N— Non M4 


a AN | epee la ah 
(N31) (N—2) 
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2(a—b)4) 
(N a=: er | My bMs 
6N2 (a— b)'4l 
<7. | (a) ha) —ab @ 4b) — 2) | pf 
— a |G + b)@ — Se) 
(N—2) 
(a +b)) .  N¥a—b)l es 
wae) + ae ia 
The corresponding mean moments are: 
(a + b)N—(a—b)? 
(ab), = No] 2 
(a eS b) INT2 
(ab)M3 = (N—® {N?— 3N(a + b) + 2(a—b)?\ m,; (5.10) 


(a—»Mq = [(a+b) (N—a-+b) {N?— 6N(a+b) + 6(a + b)? + NI 


+ l6ab {N?— 3N(a + b) + 3(a? + b?) + N}]— ne No 


3N (2) (2) 2 2 
+ ype [a+ Wy (NaF BY” + Bab {(a+ BN (0? +B 
— 2(N—1)\J m; (5.11) 
Hence for the score-sum difference distribution on the assumption 
that sampling is without replacement from a single universe: 


(a—b)?{N2—3N(a + b) + 2(a—b)?}#(N—1) 


‘ Bes 
a (Ne+h)—@apw—aye ON) 


(a—b) Po = 
(N-1)[s( N-s){N(N+1)—6s(N- s)\+16ab{N( (N+1)-3s(N-s) LI ile 
(N — 2) {5( N-s )+4ab}? 


3N[s @)(N—s)® + 8ab {s(N—s) + 2(ab—N-+ 1)}] 
2 ie (N- 2)”) {s(N- s) + 4ab}? 


(5.13) 


in which s = a+b 


For the 2-class universe in the customary symbolism: 
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(2) Pa = Ga b)ta =p) NAN See (5.14) 
pq(N-2)2{N(a+b) - (a—b) Nia 
_ (N-1) 
coPa = TRB pq | 
[s(IN-s) {N(N-++1) -6s(N—s)) + 16ab {N(N-+1) -38(N-s)—6ab}] 
{s(N-s)+4ab}? 

3(N-1) N2 28ab. N2 
(N-2)® [ Te) {s(N-s)+4ab) {s(N-s) + 4ab}? 


ri | (e15) 


When we sample without replacement from 2 identical universes, 
or replace the first (a-fold) non-replacement sample before extracting 
without replacement the second (b-fold) one, it is simplest to develop 
the moments of the two types of difference distribution as follows: 


Score-sums: 
w=k 
able = E(,x, —px.)* = sy (— 1)"kw, Oa fe ms 
w=0 
w=k 
ap, = 2 (= 1)" Keay bbw + bw (5.16) 
w=0 
Mean Scores: 
os = 1 
a, = EC —7-)* = SLE (bx, ayx,)* 
1 w=k 
all = Tepe (24) bial Mga eky (5.17) 


w=0 


The appropriate expressions for ,u,, or ,, ,, in the above are, of 
course, as developed in (4.08) — (4.10) above. 


From (5.16) in this way we derive 


(N-1) (a—b)? [(N-s) (N—2s)-2ab}? 


Pi = (N-2)? )? [s( (N-s) + 2ab | : foe 
= N-1 (N-s)(N-2s){s(N-3s)+12ab)+2ab{N(N-+ 1)-6ab} 
SE reettNa ae {s (N-s) + 2ab\2 Bs 


3N®  [s (N-s)® —-2ab/(N-s) (N-2s) +N-ab-1!] 
(N—2)®) {s (N-s) + 2ab\? 
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6ab(N —a) (N-b) 


al 
{s(N-s)+2ab\? Sth) 
in which s = a+b 
Similarly, we derive from (5.17): 
Lgetle? IN? CN =1) (NR =3)2 
ahi = (——— stoi) (5.20) 


Payer (NA2)* (NR LSet 


aP2 


(N-1)(RN-1) ;RN-3 RN(1-R) 
~  (N-2)/8) eae arene ; 


(N-1) [3N2(N+1)R-2N(7N+1)-+ 6ab] i 
~ (N-2)® ab (NR - 2)? . 


3N) 
(RN — 2)2(N-2)® 


ae | (=) (NR-R~1)(NR-1)+1 


7 * {83RN(N - 1)-2(N?2-+N- 1)} | 


6{N?~ab(NR-1)} 
ab (NR —2)? (5.21) 
1 


1 
in which R = — + — 
a b 


The foregoing expressions like those for the sample mean 
in § 4 above give us some insight into the type of curve likely to be 
satisfactory for purposes of quadrature. It will suffice to comment 
from this viewpoint on the expressions for the first two Pearson 
coefficients of the difference distribution defined by (5.12) and 
(5.13) for the situation in which both samples come from the same 
finite universe. Both expressions simplify greatly, if we choose 
samples of equal size (a=b) in which event (,_,) 8, = 0 and (,_}) B2 re- 


duces to: 


Nad ot een nip weal 3, 
Se NiNes SINE sr Meine oe He 


3(N-1) 


daN(N-2)9 {(2a-1) (N- 2a)” + 8a2(N — a) — 8a(N - 1)} 
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If a= N(N+1) ~ 6(N-1) =b, it is thus apparent that the 
difference distribution is symmetrical; and the value of the second 
Pearson Coefficient is independent of f, i.e. of the structure of 
universe. On substitution of this sample size in the expression above 
we find that (,_}) Bz reduces to 3(N-1) + (N-+1); but the interpretation 
of this result is so meaningful only within the framework of the 
assumption that both N and a must be integers. Evidently the 
coefficient of £, will be small if N = 6a, i.e. each sample is a one 
sixth fraction of the universe of choice. For larger values of N we may 
thus say that an overall sampling fraction of one third will ensure 
that the kurtosis of the difference distribution is independent of the 
kurtosis of the u-s—d. More generally for N = 6a, (,_;) Bg reduces to: 


6(N-1)2 3(N-1)2(N2-6N + 6) 


N®) 2 N® 


The maximum finite value of f, occurs in the binary universe, 
the frequencies of the classes being N'! and (N-1) Nt respectively, 
one class being then represented by only one member. The second 
Pearson Coefficient of its u-s—-d is (N*-3N+3) + (N-1). This is 
its maximum value; and the maximum value of (,_,) 2 is therefore 
exactly 3. Thus the difference distribution is necessarily platykurtic 
and the greatest contribution which can be made by the term 
involving f, is 6(N?-3N-+3) + N(N-2)°. The table below shows 
for various values of N, thevalues of the two terms in (,_) 8. calculated 
on the assumption that /, has its maximum value, as above. 


N Ist term 2nd term N lst term 2nd term 
6 eee 25 30 0.21 2.79 
12 0.62 2.38 42 0.15 2.85 
18 0.38 2.62 60 0.10 2.90 
24 0.27 2.13 96 0.06 2.94 


Even if the u-s-d of the binary universe is very platykurtic 
we therefore see that samples of size equal to 1/,th of the universe 
will generate a symmetrical difference distribution having a second 
Pearson Coefficient greater than or equal to 2.8 if N is greater than 
or equal to 30. For any universe of 30 or more items, regardless of 
the number of score classes and of items in each, there is good reason 
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to assume that the first two Pearson coefficients of the distribution 
of the difference referable to equal samples of one sixth will lie very 
close to their normal values. However, this does not suffice to 
justify the conclusion that the normal curve will give an adequate 
quadrature for the sample difference distribution. An examination 
of how gratuitous such an assumption may be will indeed give us 
some insight into circumstances which guarantee a good fit. 

In particular, we recall the case of sampling from a 2-class 
universe. Without restriction on the values of a and b, the difference 
distribution is then definable as follows for a u-s—d of score values 
differing by unit increment: 


Difference Scores —l 0 +1 
F . a N-a-b b 
requencies N N N 


When a=b and F = (a+b)—N is the total sampling fraction, this 


reduces to: 


Difference Scores —l 0 +1 
1—F J 
Frequencies 7 — 5 


i 
Whence (461 = 9 and (.4) 82 = 3.0 if a=b and F ane 


The difference distribution is then a special case of what we call 
in a subsequent contribution of this series the burette universe, viz.: 


Score —1 0 ape 
1 2 iL 
Frequency 6 3 a 


We may here make use of results obtainable from sampling in the bu- 
rette (infinite discrete 3-class) universe by stating at this stage pacout 
proof the following conclusion: when the first two Pearson cofficients 
of a distribution involving 20 score classes are very close to their 
normal values, we may confidently invoke the normal distribution 
for purposes of quadrature adequate for statistical usage. It is 
therefore immaterial to examine the implications of the foregoing 
formulae for (4) 81 and (.») Pz more closely. It suffices to state of 


any finite unimodal universe that: 
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(a) the first 2 coefficients of the non-replacement difference 
distribution w.r.t. a-fold and b-fold samples will lie very close to their 
normal values if both the following conditions hold good: (i) the 
sample sizes are equal (a = b); (ii) the total sampling fraction (F = 
(a+b) +N) is in the neighbourhood of one third; 

(b) the normal curve will then give a satisfactory quadrature 
if the distribution of the a-fold sample is referable to at least 10 
different score values. 


6. Numerical illustrations of sampling in the finite universe. 


In §5 above, we have intimated that two conditions are pre- 
eminently relevant to the adequacy of the normal as a descriptive 
curve for approximate quadrature of a unimodal r-fold sampling 
distribution, viz.: (i) how closely the Pearson coefficients of symmetry 
and kurtosis approximate to their normal values of 0 and 3 respec- 
tively; (ii) how many (n,) score classes are specifiable by frequencies 
other than zero. To give due weight to both considerations last stated 
we need to recall one conclusion already stated and a second implicit 
in the non-replacement condition: 


(a) the condition that , 6, is near 3.0 is that the sampling fraction 
F is in the neighbourhood of 0.35; and that , 8, is near zero is that 
F = 0.5. 


(b) the condition that n, is a maximum is that the u-s—d is 
rectangular or U-shaped to ensure the minimum number of vanishing 
terms in the hypergeometric series for the r-fold sample distribution, 
whence f, < 1.8. 


Having regard to these conditions, we here present tables of the 
distribution of samples of 14 (F = 0.35) and 20 (F = 0.5) taken from 
5-class 40-fold universes (A-F) as follows 


Score values of u.s.d. 


Ref. No.  —2 Si 0 +1 Ee By Bs 
A 14 4 4 4 14 0 1.27 
B 8 8 8 8 8 0 1.70 
Cc 5 8 14 8 5 0 2.24 
D 2 6 24 6 2 0 3.88 
E 12 10 8 6 4 0.206 2.03 
F 2 4 20 10 4 0.025 3.23 
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Tables I-VIII exhibit the appropriate r-fold (r = 14 or r= 20) 
sample distributions with corresponding values of ,8, and ,f, for 
the above in juxtaposition to the normal distribution for unit variance 
with due regard to the appropriate half-interval correction. 

We may get the salient features of these tables into focus if 
we now tabulate side by side the area of the tail of the normal 
distribution cut off at values near the 20 (so-called significance) 
level and the exact sum of frequency terms excluded. The results 
show a remarkably close fit for a wide range of 40-fold universes. 


Significance Size of Pearson Coefficients Excluded area Sum of excluded 
Level Sample of the u-s-d . of Normal frequencies 
By Bs 
2.0788 14 0 1.27 0.01426 0.01488 
2.0831 14 0 1.70 0.01314 0.01395 
1.9360 14 0 2.24 9.01800 0.01903 
1.9486 14 0 3.88 0.01388 0.01604 
1.9795 14 0.206 2.03 0.00903 0.00886 
1.9110 14 0.025 3.23 0.00828 0.00927 
1.8883 20 0.206 2.03 0.02186 0.02241 
1.9887 20 0.025 3.23 0.01410 0.01561 


Table I. Comparison between the Normal Integral and a 14-fold non-replacement 
sample drawn from Universe A. 


o = 5.2916 +B, = 0 OB, 232.92 
Standard Score 

Deviation Frequency Cumulative frequency 

z Exact Normal Exact Normal 

0 0.07461 0.07539 0.07461 0.07528 
0.1890 0.07337 0.07405 0.22135 9.22320 
0.3780 0.06973 0.07019 0.36081 0.36341 
0.5669 0.06392 0.06420 0.48865 0.49170 
0.7559 0.05664 0.05665 0.60193 0.60490 
0.9449 0.04850 0.04824 0.69893 0.70115 
1.1339 0.04006 0.03964 0.77905 0.78068 
1.3228 0.03188 0.03143 0.84281 0.84360 
1.5118 0.02449 0.02404 0.89179 0.89179 
1.7008 0.01813 0.01775 0.92805 0.92739 
1.8898 0.01290 0.01264 0.95385 0.95277 


2.0788 0.00881 0.00869 0.97147 0.97024 


336 Hogben and Cross, Sampling from a Discrete Universe 


o = 5.2916 ree © 1B, = 2.92 
sacl Aigd Frequency Cumulative frequency 

- Exact Normal Exact Normal 
2.2677 0.00580 0.00576 0.98307 0.98295 
2.4567 0.00365 0.00369 0.99037 0.98926 
2.6457 0.00220 0.00228 0.99477 0.99384 
2.8347 0.00126 0.00136 0.99729 0.99659 
3.0237 0.00069 0.00078 0.99867 0.99817 
3.2126 0.00036 0.00043 0.99939 0.99906 
3.4016 0.00018 0.00023 0.99975 0.99953 
3.5906 0.00008 0.00012 0.99991 0.99976 
3.7796 0.00004. 0.00006 0.99998 0.99989 
3.9686 0.00001 0.00003 1.00000 0.99996 


Table II. Comparison between the Normal Integral and a 14-fold non-replacement 
sample drawn from Universe B. 


o = 4.3205 6, = 0 ~ Bo = 2.90 
Standard Score 
Deviation Frequency Cumulative frequency 
@ Exact Normal Exact Normal 
0 0.09120 0.09234 0.09120 0.09217 
0.2314 0.08890 0.08989 0.26900 0.27158 
0.4619 0.08234 0.08299 0.43368 0.43718 
0.6944 0.07244 0.07255 0.57856 0.58211 
0.9258 0.06049 0.06015 0.69954 0.70236 
1.1573 0.04790 0.04726 0.79534 0.79698 
1.3887 0.03592 0.03521 0.86718 0.86753 
1.6202 0.02548 0.02485 0.91814 0.91740 
1.8516 0.01705 0.01663 0.95224 0.95084 
2.0831 0.01074 0.01054 0.97372 0.97211 
2.3145 0.00635 0.00634 0.98642 0.98490 
2.5460 0.00351 0.00361 0.99344 0.99222 
2.70775 0.00181 0.00195 0.99706 0.99618 
3.0089 0.00086 0.00100 0.99878 0.99822 
3.2404 0.00038 - 0.00048 0.99954 0.99921 
3.4719 0.00015 0.00022 0.99984 0.99966 
3.7033 0.00005 0.00009 0.99994. 0.99987 
3.9347 0.00002 0.00004 0.99998 0.99995 
4.1662 0.00001 0.00002 1.00000 0.99998 


a a ee ee. ee | es 
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Table ITI, Comparison between the Normal Integral and a 14-fold non-replacement 
sample drawn from Universe C. 
a= 3.6157 rp, = 0 rf, = 2.92 


ee a ee eee ae 


Standard Score 


Deviation Frequency Cumulative frequency 
o Exact Normal Exact Normal 
eee perro ee eee eee 
0 0.10867 0.11033 0.10867 0.11005 
0.2766 0.10483 0.10619 0.31833 0.32177 
0.5531 0.09407 0.09468 0.50647 0.51072 
0.8297 0.07845 0.07819 0.66337 0.66695 
1.1063 0.06070 0.05983 0.78477 0.78670 
1.3829 0.04347 0.04241 0.87171 0.87176 
1.6594 0.02872 0.02785 0.92915 0.92777 
1.9360 0.01743 0.01694 0.96401 0.96194 
2.2126 0.00966 0.00954 0.98333 0.98125 
2.4891 0.00485 0.00498 0.99303 0.99139 
2.7657 0.00219 0.00241 0.99741 0.99631 
3.0423 0.00087 0.00108 0.99915 0.99852 
3.3189 0.00030 0.00045 0.99975 0.99945 
3.5954 0.00009 0.00017 0.99993 0.99981 
3.8720 0.00002 0.00006 0.99997 0.99994 


Table IV. Comparison between the Normal Integral and a 14-fold non-replacement 
sample drawn from Universe D. 


o = 2.5660 Pf, = 9 Ps = 2-81 
a eS 
accor a ata cs Frequency Cumulative frequency 
a Exact Normal Exact Normal 
a Lene een 
0 0.15228 0.15547 0.15228 0.15450 
0.3897 0.14205 0.14410 0.43638 0.44120 
0.7794 0.11512 0.11474 0.66662 0.67008 
1.1691 0.08055 0.07849 0.82772 0.82741 
1.5588 0.04813 0.04613 0.92398 0.92050 
1.9486 0.02413 0.02329 0.97224 0.96791 
2.3383 0.00989 0.01010 0.99202 0.98869 
2.7280 0.00316 0.00377 0.99834 0.99652 
3.1177 0.00072 0.00121 0.99978 0.99908 
3.5074 0.00010 0.00033 0.99998 0.99968 
3.8971 0.00001 0.00008 1.00000 0.99996 


eee 
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Table V. Comparison between the Normal Integral and a 14-fold non-replacement 
sample drawn from Universe E, 


o = 4.0414 By = 0.002 Ba = 2.89 

; esa ia Frequency Cumulative frequency 
= Exact Normal Exact Normal 
—3.9590 0.00001 0.00004 0.00001 0.00006 
—3.7116 0.00003 0.00010 0.00004 0.00017 
—3.4641 0.00010 0.00024. 0.00014. 0.00042 
—3.2167 0.00031 0.00056 0.00045 0.00099 
—2.9693 0.00085 0.00120 0.00130 0.00223 
—2.7218 0.00199 0.00243 0.00329 0.00469 
—2.4744 0.00421 0.00462 0.00750 0.00938 
— 2.2269 0.00804 0.00827 0.01554 0.01772 
—1.9795 0.01408 0.01392 0.02962 0.03175 
—1.7321 0.02272 0.02202 0.05234 0.05389 
—1.4846 0.03403 0.03279 0.08637 0.08678 
—1,2372 0.04749 0.04592 0.13386 0.13275 
—0.9898 0.06202 0.06048 0.19588 0.19323 
—0.7423 0.07599 0.07493 0.27187 0.26808 
—0.4949 0.08755 0.08734 0.35942 0.35525 
—0.2474 0.09502 0.09549 0.45444 0.45075 
0.0000 0.09727 0.09871 0.55171 0.54925 
+0.2474 0.09398 0.09549 0.64569 0.64475 
+0.4949 0.08574 0.08734 0.73143 0.73192 
+0.7423 0.07387 0.07493 0.80530 0.80677 
+0.9898 0.06008 0.06048 0.86538 0.86725 
+1.2372 0.04610 0.04592 0.91148 0.91322 
+1.4846 0.03332 0.03279 0.94480 0.94611 
+ 1.7321 0.02267 0.02202 0.96747 0.96825 
+1.9795 0.01447 0.01392 0.98194. 0.98228 
+2.2269 0.00866 0.00827 0.99060 0.99062 
+2.4744 0.00483 0.00462 0.99543 0.99531 
+2.7218 0.00251 0.00243 0.99794 0.99777 
+2.9693 — 0.00120 0.00120 0.99914 0.99901 
+3.2167 0.00053 0.00056 0.99967 0.99958 
+3,.4641 0.00021 0.00024 0.99988 0.99983 
+3.7116 0.00008 0.00010 0.99996 0.99994 
+3.9590 0.00002 0.00004 0.99998 0.99998 
+4.2065 0.00001 0.00002 0.99999 0.99999 


a enn gee a ee 
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Table VI, Comparison between the Normal Integral and a 14-fold non-replacement 
sample drawn from Universe A. 


o = 2.8781 


By = 0.0003 


+B = 2.84 


ee en ee ee ee eee 


Standard Score 
Deviation 
>.< 


o 


Exact 


Frequency 
Normal 


Cumulative frequency 


Exact 


Normal 


Oe ee ee es 


—3.9957 
—3.6482 
—3.3008 
—2.9533 
—2.6059 
—2.2584 
—1.9110 
—1.5635 
—1.2161 
—0.8686 
—0.5212 
—0.1737 
+0.1737 
+0.5212 
+0.8686 
+1.2161 
+1.5635 
+1.9110 
+2.2584 
+2.6059 
+2.9533 
+3.3008 
+3.6482 
+3.9957 


0.00001 
0.00006 
0.00037 
0.00147 
0.00449 
0.01116 
0.02342 
0.04246 
0.06749 
0.09499 
0.11913 
0.13358 
0.13408 
0.12037 
0.09633 
0.06833 
0.04261 
0.02309 
0.01071 
0.00416 
0.00131 
0.00032 
0.00005 
0.00001 


0.00004 
0.00018 
0.00060 
0.00177 
0.00463 
0.01013 
0.02229 
0.04083 
0.06617 
0.09505 
0.12100 
0.13653 
0.13653 
0.12100 
0.09505 
0.06617 
0.04083 
0.02229 
0.01013 
0.00463 
0.00177 
0.00060 
0.00018 
0.00004 


0.00001 
0.00007 
0.00044 
0.00191 
0.00640 
0.01756 
0.04098 
0.08344 
0.15093 
0.24592 
0.36505 
0.49863 
0.63271 
0.75308 
0.84941 
0.91774 
0.96035 
0.98344 
0.99415 
0.99831 
0.99962 
0.99994 
0.99999 
1.00000 


0.00007 
0.00026 
0.00089 
0.00273 
0.00751 
0.01855 
0.04118 
0.08230 
0.14863 
0.24356 
0.36415 
0.50000 
0.63585 
0.75644 
0.85137 
0.91770 
0.95882 
0.98145 
0.99249 
0.99727 
0.99911 
0.99974 
0.99993 
0.99999 


Table VII. Comparison between the Normal Integral and a 20-fold non-replacement 
sample drawn from Universe E. 


ao = 4.2366 


rfp, = 0 


+B = 2.897 


a 


Standard Score 
Deviation 


Exact 


0.09295 
0.09052 
0.08360 
0.07319 
0.06069 
0.04762 


Frequency Cumulative frequency 

Normal Exact Normal 
0.09416 0.09295 0.09298 
0.09157 0.27399 0.27674 
0.08423 0.44119 0.44488 
0.07335 0.58757 0.59128 
0.06030 0.70895 0.71182 
0.04693 0.80419 0.80578 
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o = 4.2366 rp, = 9 r By = 2.897 
sa at Frequency Cumulative frequency 
= Exact Normal Exact Normal 

1.4162 0.03531 0.03454 0.87481 0.87500 
1.6523 0.02469 0.02405 0.92419 0.92330 
1.8883 0.01625 0.01584 0.95669 0.95518 
2.1243 0.01003 0.00986 0.97675 0.97506 
2.3604 0.00579 0.00581 0.98833 0.98780 
2.5964 0.00311 0.00324 0.99455 0.99335 
2.8325 0.00155 0.00171 0.99765 0.99682 
3.0685 0.00071 0.00085 0.99907 0.99855 
3.3045 0.00030 0.00040 0.99967 0.99938 
3.5406 0.00011 0.00018 0.99989 0.99974 
3.7766 0.00004 0.00007 0.99997 0.99990 
4.0126 0.00001 0.00003 0.99999 0.99996 


Tedle V IIT, Comparison between the Normal Integral and a 20-foldnon-replacement 
sample drawn from Universe F. 


o = 3.017 ies 0 By = 2.88 
Standard Score 

Deviation Frequency Cumulative frequency 

= Exact Normal Exact Normal 

0 0.12940 0.13223 0.12940 0.13168 
0.3314 0.12303 0.12516 0.37546 0.38094 
0.6629 0.10567 0.10614 0.58680 0.59269 
0.9944 0.08175 0.08065 0.75030 0.75398 
1.3258 0.05671 0.05490 0.86372 0.86417 
1.6573 0.03501 0.03349 0.93374 0.93168 
1.9887 0.01903 0.01830 0.97180 0.96879 
2.3202 0.00897 0.00896 0.98974 0.98707 
2.6516 0.00359 0.00393 0.99692 0.99514 
2.9831 0.00118 0.00154 0.99928 0.99835 
3.3145 0.00030 0.00055 0.99988 0.99951 
3.0400 0.00005 0.00017 0.99998 0.99986 
3.9775 0.00001 0.00005 1.00000 0.99997 


SU UUEcemeneeeeee ee ee 


Summary. 

Moments of the score-sum and mean-score distribution for the 
general case of sampling without replacement from a universe of N 
objects assignable to n classes each specifiable by a definitive score 
disclose three outstanding common properties: (a) symmetry of the 
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half-universe sample distribution; (b) platykurtic r-fold sampling 
distributions even for a highly leptokurtic u-s-d, for sampling 
fractions in the range F ~ 0.22 to F ~ 0.79; (c) close correspondence 
between the Pearson coefficients of a one-third sample from a sizeable 
(N > 40) universe and their normal values. 

Numerical examinations of distributions referable to different 
sampling fractions from universes defined by widely different 
contours illustrate these conclusions; and moments of difference 
distributions generated by sampling without replacement are also 
derived. 

Résumé. 

La somme des moments et la distribution des moments moyens 
dun échantillon obtenu par tirages exhaustifs d’une population de 
N individus répartis en n classes, délimitables par des caractéristiques 
définies, démontre trois qualités marquées en commun: 

a) Distribution symétrique dans un échantillon composé de la 
moitié de la population. 

b) Distributions platycurtics dans un échantillon doublé r fois, 
aussi pour un u-s-d fort leptocurtic, pour les fractions de l’échantillon 
dans la sphére de F ~ 0.22 a F ~ 0.79. 

c) Un accord satisfaisant entre les coefficients de Pearson dans 
la distribution d’un échantillon composé d’un tiers d’une population 
d’une certaine grandeur (N > 40) et leur valeurs normales. 

Des études numériques de distributions es rapportant aux 
fractions de populations définies par des limites trés divergentes éclair- 
cissent ces conclusions. Des moments de distributions de différence 
obtenues par tirages exhaustifs sont aussi dérivés. 


Zusammenfassung. 


Die Momentsumme und die Verteilung der Mittelmomente in 
einer allgemeinen Stichprobe, welche ohne Riickstellung einer Popu- 
lation von N Individuen, verteilt auf n Klassen, entnommen wurde, 
eine jede durch bestimmte Charakteristika spezifiziert, weisen drei 
ausgepragte, gemeinsame Eigenschaften auf: 

a) Symmetrie in einer Stichprobeverteilung, bestehend aus der 
halben Population; 

b) flachkegelformige, r-faltige Stichprobenverteilungen, selbst fiir 
stark leptokurtisches u-s-d, fiir Stichprobenfraktionen innerhalb des 


Gebietes F ~ 0.22 bis F ~ 0.79; 
c) nahe Ubereinstimmung zwischen den Pearson-Koeffizienten 
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in einer Stichprobe, bestehend aus einem Drittel einer einigermafien 
groBen Population (N > 40), und deren Normalwerten. 

ZiffermaBige Untersuchungen von Verteilungen, welche zu Frak- 
tionen von Populationen, bestimmt durch voneinander weit verschie- 
dene Grenzen, gerechnet werden kénnen, erlautern diese SchluBsatze; 
Momente in Differenzverteilungen, welche man durch Stichprobenent- 
nahme ohne Riickstellung erhalten hat, werden ebenfalls abgeleitet. 
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ERFAHRUNGEN MIT DER ESSEN-MOLLERSCHEN 
FORMEL BEI DER ERBBIOLOGISCHEN 
VATERSCHAFTSBEGUTACHTUNG 


Von DIETRICH WICHMANN, Tibingen 


Die zunehmende Bedeutung der erbbiologischen Begutachtungen 
bei Paternitatsklagen lieB bei den Richtern und bei den Sachver- 
standigen den Wunsch aufkommen, das Ergebnis der erbbiologischen 
Untersuchung in einen zahlenmafSigen Ausdruck zu fassen. Die von 
Essen-Moller entwickelte Formel, die itibrigens nicht die einzige 
Méglichkeit darstellt, ist wohl am bekanntesten geworden. Die An- 
sichten tiber ihren Wert sind allerdings in der Fachwelt geteilt. 
Wahrend E. Fischer, F. Lenz, O. Reche und besonders W’. Ludwig 
und M. Weninger sich ablehnend verhalten, stimmten — wenn auch 
teilweise mit Vorbehalten — E. Geyer, L. Léffler, P. Kramp, S. Koller 
und K. Tuppa ihr zu. Es ist daher angezeigt, die praktische Uber- 
priifung ihrer Kignung an einer weiteren Untersuchungsserie durchzu- 
fiihren, nachdem E. Geyer mit positivem Ergebnis vorangegangen war. 

Zu Beginn sei aber auf die wesentlichsten Einwande eingegangen, 
wobei gema® unserem Thema die theoretischen Vorbehalte in den 
Hintergrund treten sollen. So wird einer biologischen Denkweise 
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eine mechanisch-rechnerische gegentibergestellt; es gibt aber nur 
eine, die allen Wissenschaften gemeinsam ist, nimlich die logische. 
Auch wenn man nicht der Ansicht ist, dal} die Mathematik die 
K@énigin der Wissenschaften ist, wird man zugeben miissen, da sie 
die Denkgesetze in die kiirzeste und allgemeinste Form bringt. 
Weiterhin hat man eingewandt, daf fiir ein Ereignis, das schon 
eingetreten ist, keine objektive Wahrscheinlichkeit festgestellt werden 
kann. Als Ereignis eingetreten ist in unserem Falle aber nur die 
Erzeugung eines Kindes, nicht dagegen die Vaterschaft eines be- 
stimmten Prasumtivvaters. Denn ob ein angegebener Mann der 
Vater eines Kindes ist oder nicht, ist ja gerade das Problem, vor dem 
der Gutachter steht und das er mit erbbiologischen Methoden lésen 
soll. Richtig ist jedoch, da$ das Wahrscheinlichkeitsurteil die Sicher- 
heit der Feststellung betrifft. Es ist zuzugeben, daB der Ausdruck 
,, Vaterschaftswahrscheinlichkeit’’ unkorrekt ist. Tatsdchlich wird 
mit Hilfe der Essen-Méllerschen Formel die Wahrscheinlichkeit be- 
stimmt, mit der ein in Anspruch genommener Mann der Gruppe der 
wahren oder der der falschen Vater zugeordnet werden kann. Mit 
anderen Worten: die Begutachtung mit der Essen-Méllerschen 
Formel behandelt also den Einzelfall als Stichprobe aus einem gréBeren 
Vergleichsmaterial mit bestimmten Grenzfestsetzungen. 


Essen-Méller geht von folgenden Gedanken aus: 


1. Der Vater wird ein Merkmal seines Kindes haufiger besitzen 
als ein falsch angegebener Mann. 

2. Die Ubereinstimmung in einem seltenen Merkmal ist von 
gréBerer Bedeutung als die in einem haufigen Merkmal. 

3. Bei gleichzeitiger Mutter-Kind-Ahnlichkeit ist das Auftreten 
des kindlichen Merkmals beim Prasumptivvater von geringerer 
Beweiskraft als wenn es der Mutter fehlen wiirde. 

4. Die Sicherheit des Schlusses auf eine Vaterschaft wachst mit 
der Zahl der Ubereinstimmungen. 

Die Richtigkeit dieser Grundsatze ist unbestreitbar, sie diirften 
wohl von allen Gutachtern angewandt werden, wobei das Schwer- 
gewicht je nach Erfahrung auf den einen oder andern Satz gelegt 
wird. Neu ist bei Essen-Méller die Kombination dieser Satze in 
einer Formel. 

Der Hinweis auf mégliche Manifestationsschwankungen eines 
Gens ist von schwerwiegender Bedeutung. Bekanntlich ist Geyer 
in seiner ersten Arbeit von bestimmten Genhaufigkeiten und Erb- 
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gingen ausgegangen, deren Stichhaltigkeit teilweise durchaus be- 
streitbar ist. Daher méchte auch Léffler die Anwendung der Formel 
vorerst nur auf die serologischen Merkmale beschranken. Es ist aber 
durchaus nicht notwendig, in die Formel mit bestimmten Erbgangen 
einzugehen, wie die Kritiker anscheinend annehmen. Der sogenannte 
,kritische Wert‘, d. h. die Merkmalshaufigkeit bei falschen Vatern 
(Y) im Verhaltnis zu der bei wahren Vatern (X), laBt sich auch 
empirisch durch Auswerten von Bevélkerungs- und Familienunter- 
suchungen gewinnen. Allerdings sollten diese Untersuchungsserien 
nicht zu klein sein, vor allem wenn man eine weitgehende Klassen- 
unterteilung zu Grunde legt. Diese Schwache hat iibrigens auch 
Geyer selbst bemerkt und in einer weiteren, anscheinend ziemlich 
unbekannt gebliebenen Arbeit die empirische Berechnung des 
,kritischen Wertes’’ an Hand der Thenarbemusterung gezeigt. 
Durch Verwendung eines grofen Familienmaterials gehen aber 
auch mégliche Manifestationsschwankungen bzw. Klassifikations- 
schwierigkeiten bei Merkmalen mit kontinuierlichen Ubergangen in 
die Formel ein. Hierdurch werden ,,Ausschliisse“* unméglich, die 
Seltenheit derartiger Konstellationen wird jedoch mit geniigendem 
Gewicht in der Gesamtbeurteilung bewertet. Die Verwendung em- 
pirischen Familienmaterials bietet aber noch weitere Vorteile. Da 
die Haufigkeiten bestimmter Kind-Mutter-Vater-Konstellationen aus- 
gezahlt werden, braucht man eine etwa vorhandene Paarungssiebung 
bei der Partnerwahl nicht besonders zu beriicksichtigen. Sorgt man 
weiterhin dafiir, da die Filialgeneration aus jugendlichen Probanden 
besteht, so kann man auf zusatzliche Berechnungen verzichten, die 
den Altersunterschied ausgleichen sollen. Wenn in der Filialgeneration 
Jungen und Madchen zu gleichen Teilen vorhanden sind, brauchen 
Geschlechtsunterschiede in der Merkmalsverteilung nicht in einer be- 
sonderen Berechnung ausgewertet zu werden. Wenn das Familien- 
material grof genug ist, kénnen aber selbstverstandlich Jungen- 
Mutter-Vater- und Madchen-Mutter-Vater-Gruppen gebildet werden, 
wodurch die Aussage etwas scharfer wird. Sind Alters- und Geschlechts- 
unterschiede nachweisbar, dann sollte man den Wert Y ,,Merkmals- 
haufigkeit in der Bevélkerung’ nicht aus der Gesamtbevélkerung 
sondern aus der erwachsenen mdnnlichen Bevilkerung berechnen. 
Die unterschiedlichen Haufigkeiten in verschiedenen Bevélke- 
rungen sind selbstverstandlich nicht ohne Einflu8 auf das Ergebnis. 
Es heiBt aber ihr Gewicht itberschatzen, wenn man fiir jede in Be- 
tracht kommende Gegend gesondert die Merkmalshaufigkeiten fest- 
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stellen miiBte, so wiinschenswert die anthropologisch-erbbiologische 
Bestandesaufnahme einzelner Bevélkerungen auch ist. Ein Beispiel 
mag dies zeigen. Die Haufigkeit des Blutgruppengens B betragt in 
Westdeutschland (Bonn) 6,8°/,, in Ostdeutschland (Kénigsberg) 
ist es hingegen doppelt so haufig: 13,2 °/,. Die Konstellation Kind 
B, Mutter 0, Prasumtivvater B ist also in Westdeutschland ohne 
Zweifel beweiskraftiger als in Ostdeutschland. Im vorstehenden Fall 
1a8t sich fir Bonn eine Zuordnungswahrscheinlichkeit von 88,5 MS 
fiir Koénigsberg eine solche von 82,0 °/, berechnen. Fiir den gesamt- 
deutschen Durchschnitt (Genhaufigkeit 9°/,) betragt sie 85,5 °/). 
Es miissen also schon sehr erhebliche Unterschiede zwischen den 
Bevélkerungen vorliegen, wenn sie das Ergebnis entscheidend be- 
einflussen sollen. Erst wenn das dominante Gen rund 3mal so haufig 
ist, zB. Gen A im Reichsdurchschnitt 29 °/,, ist ein wesentliches 
Absinken der Zuordnungswahrscheinlichkeit zu beobachten, sie be- 
tragt dann bei der Konstellation Kind A, Mutter 0, Prasumptiv- 
vater A 67,5 °/). 

SchlieBlich sind noch Bedenken dagegen geduBert worden, ver- 
schiedenartige Merkmale in einer Formel zu vereinigen. Der Einwand 
kann aber nicht als stichhaltig gelten, da derartige Merkmale auch 
bei der abschlieBenden Gesamtbeurteilung eines Falles verwertet 
werden. Im Gegenteil, gerade die rechnerische Behandlung einer 
erbbiologischen Untersuchung bewertet erst die unterschiedlichen Aus- 
sagen der einzelnen Merkmale unter Beriicksichtigung ihrer Haufig- 
keit in der Bevoélkerung und in bestimmten Familienkombinationen. 
Auch in der Essen-Méllerschen Formel kann ein einziges Merkmal 
eine groBere Wahrscheinlichkeit oder Unwahrscheinlichkeit begriin- 
den als 10 andere. Eine mehr gefiihlsmaBige Abschatzung bringt 
unerwiinschte subjektive Elemente in die Beurteilung und kann 
daher zu mehr oder minder grofen Fehlschliissen fihren. 

Hier ist vielleicht der richtige Ort, kurz den Ausdruck ,,Additions- 
beweis“ zu besprechen, der der juristischen Begriffswelt entstammt 
und der éfters in erbbiologischen Gutachten auftritt. Hierunter 
wird die Haufung von einzelnen Indizien verstanden, die addiert zu 
einem Gesamtbeweis zusammengefaBt werden. Dieser Begriff ist 
aber falsch, da in einem Gesamturteil die einzelnen Wahrscheinlich- 
keiten nicht additiv sondern multiplikativ mit einander verkniipft 
sind, was auch in der Essen-Méllerschen Formel geschieht. 

In den Naturwissenschaften ist es iiblich, die Richtigkeit oder 
Unrichtigkeit einer theoretischen Uberlegung im Experiment zu 
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itherpriifen. Hierfiir dienten uns als Material 120 Vaterschafts- 
gutachten des Anthropologischen Instituts der Universitat Tubingen, 
die von Gieseler und drei Mitarbeitern erstattet wurden. Zu beurteilen 
waren insgesamt 216 Manner, von denen 112 mit mehr oder minder 
groBer Wahrscheinlichkeit als Vater bezeichnet werden konnten, 
wahrend bei 104 Mannern die Vaterschaft abgelehnt wurde. 

Das notwendige Vergleichsmaterial boten Bevélkerungsunter- 
suchungen, die zum grofen Teil unter Gieselers Leitung von dessen 
Schiilern und Mitarbeitern erhoben und teilweise auch veréffentlicht 
wurden. Weiterhin wurden Haufigkeitsziffern auch aus dem ge- 
samten Gutachtenmaterial des Instituts berechnet. Die notwendigen 
Familien konnten zum Teil aus den Bevélkerungsuntersuchungen, 
zum Teil aus dem Gutachtenmaterial zusammengestellt werden. 
AuBerdem wurde speziell fiir diesen Zweck gesammeltes Familien- 
material herangezogen. Die Zahi der Elternpaare schwankte zwischen 
300-500, die der dazugehérigen Kinder zwischen 500-1000. Be- 
stimmte Erbginge wurden nur bei den Blutgruppen des ABO- 
Systems und den Blutkérperchenmerkmalen des MN-Systems zu- 
grunde gelegt, die Genhaufigkeiten wurden aus den Werten W. Fi- 
schers fiir Gesamtdeutschland berechnet. Fiir alle anderen verwende- 
ten Merkmale — insgesamt 31 — wurden die kritischen Werte empirisch 
berechnet. Fraulein Dr. Ehrhardt und Herrn Dr. Heinrich habe ich fiir 
die freundliche Uberlassung einiger von ihnen berechneter Bewertungs- 
tabellen zu danken. 

Fir die Berechnung wurden 3 Mae. 10 morphologische Merk- 
male des Kopfes und des Gesichtes, 2 serologische Merkmale und 
16 Merkmale des Hautleistensystems an Fingerbeeren, Handflachen 
und Fufsohlen benutzt. Die Auswahl dieser Merkmale soll durchaus 
nicht als endgiiltig betrachtet werden, es war aber ein Kompromif 
zu schlieBen zwischen den Merkmalen, die bei der erbbiologischen 
Untersuchung erhoben worden waren einerseits und den an und fiir 
sich vorhandenen Unterlagen andererseits. Eine Erweiterung und 
ein teilweiser Austausch sind zukiinftig vorgesehen. 

Das Ergebnis dieser Auswertung wird aus der Abbildung 1 er- 
sichtlich. 

Der Mittelwert der Zuordnungswahrscheinlichkeit betragt fir 
die 112 wahren Vater 92,48 + 0,75 °/), far die 104 zu Unrecht in 
Anspruch genommenen Manner hingegen nur 26,34 + 1,82 os 
liegt also ein einwandfreier Unterschied (66,14 + 1,97 0/5) ‘vor. 
Weiterhin ist der Unterschied in der Anordnung der Varianten be- 
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Abb. 1. Zuordnungswahrscheinlichkeit in %. 


merkenswert, beide Gruppen zeigen zwar eine schiefe Verteilung, 
diese ist jedoch bei den falschen Vatern wesentlich weitlaufiger 
(o = 17,94 + 1,25) als bei den wahren VAatern (co = 7,72 + 0,52). 
Fiir keinen der wahren VAter ergab sich ein Wert, der im Bereich der 
falschen Vater lag und umgekehrt, jedoch ist in der Mitte der Tafel 
eine Annaherung feststellbar, der niedrigste Wert betragt bei den 
wahren Vatern 68 °/,, bei den falschen der héchste 64 °/). Bei noch 
gréBeren Serien ist also mit Uberschneidungen zu rechnen. Derartige 
Grenzfalle sind aber nicht durch die Essen-Méllersche Formel be- 
dingt, sie sind auch bei mehr gefiihlsmaBiger Abschatzung des Unter- 
suchungsergebnisses durchaus méglich. 

Ist in Fallen mit 2 Prasumtivvatern der eine dem Kinde 
ahnlich der andere hingegen unahnlich, so laBt sich sagen, daB der 
ahnliche Mann der Erzeuger des Kindes ist, wahrend der unahnliche 
Mann praktisch ausgeschieden werden kann. In einer weiteren Arbeit 
hat Essen-Méller zusammen mit dem Mathematiker Quensel seine 
bisherige Formel in dieser Richtung fortentwickelt. Es kann nicht 
meine Aufgabe sein, diese Formel im einzelnen vorzufiihren und 
abzuleiten, die Originalarbeit muf unbedingt eingesehen werden. 
Es sei hier nur soviel gesagt, dafs der jeweilige ,,kritische Wert* des 
anderen Mannes der Summe aus beiden gegeniiber gestellt wird. 
Hat z. B. der Mann A eine Wahrscheinlichkeit von 70 °/), der Mann 
B eine solche von 20°/,, so ergeben sich folgende Beziehungen, 
wenn man voraussetzt, daB sich der tatsachliche Vater unter den 
Untersuchten befindet: 
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4,00 

Fiir den Mann A 1,43 =O 3a 
0,43 

fiir den Mann B 4.43 = Dek hae 


War bisher die Wahrscheinlichkeit fiir A 3 !/,mal gréBer als die 
fiir B, so wird sie nach dem neuen Verfahren mehr als 9mal so grof. 
Zu beachten ist auBerdem, da’ die Wahrscheinlichkeiten fiir 2 Manner 
sich immer zu 100 erganzen, wodurch man eine zusatzliche Rechen- 
kontrolle erhalt. Hatte sich hingegen fiir den Mann B gesondert auch 
eine Wahrscheinlichkeit von 70°/, ergeben, so laBt sich fiir beide 

0,43 


Manner eine Wahrscheinlichkeit von = 50°, errechnen, d. h. 


der Fall ist nicht zu entscheiden, es mu auch mit der Vaterschaft 
eines bisher unbekannten Mannes gerechnet werden. 

Diese erweiterte Berechnung wurde fiir die 89 Zwei-Mann-Falle 
unseres Materials durchgefiihrt. Als wichtigste Ergebnisse sind fest- 
zustellen: Ohne Gegeniiberstellung hatte 1/, der wahren Vater eine 
geringere Wahrscheinlichkeit als 85°/,, 1/; tberschritt die 99 °/)- 
Grenze. Bei Gegeniberstellung haben sémtliche wahren V&ter eine 
Wahrscheinlichkeit von 85°/, und dariiber, */,, tiberschreiten die 
90 °/,-Grenze statt bisher °/,, die Halfte aller wahren Vater liegt iiber 
99 °/,, wahrend ohne Gegeniiberstellung nur 1/; dariiber lag. 

Noch klarer ist der Fortschritt in der Zuordnungswahrschein- 
lichkeit der falsch angegebenen Manner erkennbar. Ohne Gegeniiber- 
stellung belief sich bei ?/; die Wahrscheinlichkeit auf 15 °/, und 
darunter, nur ein knappes Zehntel hatte eine geringere Zuordnungs- 
wahrscheinlichkeit als 1 °/). Bei der Gegeniiberstellung verschieben 
sich die Verhaltnisse entsprechend.wie bei den wahren VaAtern, 
kein falsch angegebener Mann besitzt eine héhere Wahrscheinlich- 
keit als 15 °/), °/;) haben eine geringere Wahrscheinlichkeit als 10 °/, 
(bisher 1/5), die Halfte aller falschen Vater hat eine Wahrscheinlich- 


keit von 1 °/, und darunter. 
Wahre Vater 
Zuordnungswahrscheinlichkeit in % 
65-69 70-74 75-79 80-84 85-89 90-94 95-x zusammen 
Ohne Gegeniiberstellung 2 4 2 4 10 18 49 89 
Bei Gegeniiberstellung = ll 73 89 


Falsche Vater 
Zuordnungswabrscheinlichkeit in % 


x-5 6-10 11-20 21-30 31-40 41-50 51-x zusammen 
Ohne Gegeniiberstellung 17 10 11 17 11 ll 12 89 
Bei Gegeniiberstellung 73 11 5 — -- 89 
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Ks ist vielleicht von Interesse, die 3 aus der Abbildung ersicht- 
lichen Grenzfalie hier bei der Gegeniiberstellung vorzufiihren. Fiir 
einen Zeugen, der ,,mit an Sicherheit grenzender Wahrscheinlich- 
keit“S als Vater bezeichnet worden war, lief$ sich nur eine Wahr- 
scheinlichkeit von 68,5 °/, errechnen, fiir den Beklagten betrug die 
Wahrscheinlichkeit nur 6 °%/). Bei Gegeniiberstellung ergaben sich 
97,1 °/, fiir den Zeugen und 2,9 °/, fiir den Beklagten. In einem anderen 
Fall konnte fiir den Beklagten, der mit »groBter Wahrscheinlichkeit‘ 
als Vater bezeichnet worden war, nur eine Wahrscheinlichkeit von 
68,4 °/) errechnet werden, fiir den Mehrverkehrszeugen 21,8 °/,. Bei 
einem direkten Vergleich belief sich aber die Wahrscheinlichkeit fiir 
den Zeugen nur noch auf 10,6 °/,, wahrend sie heim Beklagten auf 
89,4 °/, anstieg. Im dritten Fall lautete die Wahrscheinlichkeit fiir 
den ,,sehr unwahrscheinlichen“ Zeugen 64,0 °/,, fiir den ,,sehr wahr- 
scheinlichen“ Beklagten 92,1 °/,. Bei der rechnerischen Gegeniiber- 
stellung sank die Wahrscheinlichkeit fiir den Zeugen auf 13,2 °/), 
wahrend sie fiir den Beklagten mit 86,8 °/, nahezu das 6fache des 
Zeugen betrug. 

Zusammenfassung. 


AbschlieBend kommen wir daher zu folgenden Ergebnissen: 

1. Nach unserer Erfahrung erscheint die Essen-Méllersche Me- 
thode — insbesondere in ihrer erweiterten Form — fiir die erbbiolo- 
gische Begutachtung als durchaus geeignet. 

2. Fiir die rechnerische Auswertung sollten méglichst klar 
definierbare Merkmale herangezogen werden. Die serologischen Merk- 
male kénnen in diesem Zusammenhang eine grofe Rolle spielen. 

3. Es ist angebracht, fiir die Berechnung der ,,kritischen Werte“ 
empirisch gewonnenes Vergleichsmaterial zu Grunde zu legen, nur 
bei serologischen Merkmalen kann man mit bestimmten Genhaufig- 
keiten in die Rechnung eingehen. 

4. Nach dem derzeitigen Stande der Dinge kann die Essen- 
Méllersche Formel im Rahmen der Gesamtuntersuchung dem Gut- 
achter wertvolle Hinweise bieten. Ihre alleinige Verwendung bei der 
Begutachtung in Paternitatsklagen halten wir noch fiir verfriht, 


sie ist aber als Ziel anzustreben. 


Summary. 
The following results were obtained: 
1. According to my experience the method of Essen- Moller appears 
well adapted for a calculation of the probability of disputed paternity. 
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29. For the numerical evaluation one should use characteristics 
as clearly definable as possible. Serological characters may be of great 
importance in this connection. 

3. It is advisable to base the calculation of the “critical values” 
on an empirically obtained control material. Only for serological 
characters is it possible to count upon definite gene frequences. 

4. Within the scope of the examination as a whole the formula 
of Essen- Moller may give the investigator valuable indications for the 
present. It seems however premature to use only this formula for a 
statement regarding a case of disputed paternity, though it is a 
goal to strive for. 

Résumé. 

Nous sommes arrivés aux résultats suivants: 

1. Selon notre opinion il semble qu’on puisse faire usage de la 
formule d’Essen-Méller, particuliérement dans sa forme élargie, pour 
un avis héréditaire-biologique. 

2. Pour Pévaluation numérique il faut employer des caractéres 
aussi nettement définissables que possible. Dans ce contexte les 
caractéres biologiques du sang peuvent jouer un grand rédle. 

3. En calculant les «valeurs critiques» il convient de prendre 
pour base des matériaux de comparaison obtenus empiriquement. 
Seulement quand il s’agit des caractéres biologiques du sang on peut 
s’attendre aux fréquences de génes définies. 

4. Appliquée dans les limites de l’°examen dans son ensemble la 
formule d’Essen-Méller peut donner a l’examinateur des indications de 
valeur. I] semble pourtant qu’on ne puisse encore employer que 
cette formule en donnant son avis sur des affaires de paternité, bien 
que cela soit un but a poursuivre. 
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CAUSES OF EXCESS MALE MORTALITY IN MAN 


By G. HERDAN 


Summary. 


The study of the relevant data and the discussion of the various 
hypotheses which have been put forward for the explanation of 
excess male mortality, leads to the conclusion that the explanations 
of the phenomenon, according to which its cause is sought in in- 
creased occupational risk, or in genetical differences between the 
sexes, or in sex-dimorphic physiological and endocrinological 
differences, are not mutually exclusive, but that each of them has 
its proper sphere of application for the explanation of the phenomenon 
in question at different ages of life, and for different aspects of it. 


i 


The notion of the impartiality of death which gave rise to the 
concept of death as The Leveller and inspired Holbein to the creation 
of his ““Dance of Death” woodcuts, with Death approaching indis- 
criminately high and low, The Duke and The Pedlar, is changing fast. 

We know now,—if we did not in a vague way know before,—that 
death is not so impartial as it seemed to our less sophisticated and 
less informed ancestors. Although he will eventually grab The Duke 
as he does The Pedlar, yet he hesitates somewhat when confronting 
the former, and is rather quick in eliminating the latter. He seems to 
have less scruples in slaying the lower classes. He thus, in a way, 
differentiates between rich and poor, high and low. Moreover, he 
appears to have his preferences with regard to the sexes, and that 
rather differently from what might be expected. 

In the eyes of death the male appears to be the weaker sex. We 
have arrived today at the conclusion that, for some reason or other, 
the male, in virtue of his maleness, is less viable than the female. 
Under certain circumstances the male, because of a greater inherent 
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fragility, succumbs more easily to the force of death. This phenomenon 
has been reviewed by F. A. E. Crew, F.R.S., not only for the human 
species but also for other mammals, birds and insects, both in the 
open and under controlled conditions of experimentation, (in his 
presidential address for Section D [Zoology] of the British Association 
in September, 1937). He has shown that the whole course of sex 
mortality in pre-natal life, infancy and subsequent age periods, is 
consistent with the view that the male in man is the inherently 
weaker sex, more prone to death from diseases at all times’. It may 
be that during life the male is still the stronger, but when it comes to 
dying he has decidedly less resistance. We may also express this 
by saying that the force of death is stronger when directed against 
the male. For a comprehensive presentation and thorough discussion 
of the problems involved, see C. Gini, «Il sesso dal punto di vista 
statistico», Metron, Roma. 

Looking closer we find that this effect is due to many of the 
weapons death uses,—the causes of death,—having themselves a 
greater affinity to the male. It becomes more and more clear that it is 
only the lack of detailed knowledge which was conducive to the 
formation of the striking overall picture of death as the leveller, whose 
action appeared as the very embodiment of chance. As our knowledge 
grows, this picture in which we distinguished at first nothing but 
life and death, light and dark, begins to show all the complexity of 
the network of correlations between the causes of death and the 
classes of humanity. 


Li: 


The phenomenon of male excess mortality, though possessing a 
fairly high degree of generality as regards country, time and age of 
life, is by no means uniform in all these respects, nor can it be said to 
have complete generality. It varies to some extent from country to 
country, and changes from time to time as regards intensity, and 
for certain ages of life it gives place to the opposite phenomenon, 
female excess mortality. The following tables of the ratio: male over 
female death rate which are based upon material by P. Delaporte* 
show that there are characteristic differences in excess male mortality 
from place to place and from time to time. 

‘ “The Sex Ratio”’. Presidential address for section D (Zoology) of the British 
Association, September 1937, 


* P, Delaporte. Evolution de la mortalité en Europe depuis l’origine des 
statistiques de état civil, «Statistique Generale de la France», Paris 1941, 
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Table 1. Sweden. 


Age 1750 1780 1800 1820 1840 1860 1880 1900 1920 
0 Lil SMBIU G14 Ae Git ie) Gio) 18s 1,30 
1 1,20 1,03 1,05 1,06 1,02 1,08 1,06....108 177 
2 0,98 1,00 1,03 103 Li lll 109 1408 1,22 
3 O97 15003) Gis (1,10 4105. fda 03106 Li 
4 jE E OAL NRT Oxe) sheer BIKMs Roki ax 
5 ERT WAT e Se Raw aes Le wee 
6 1,08 108 1,03 1,06 1,14 1,13 1,06 1,02 1,16 
1 1,08 1,08 1,01 113 107 4111 1,06 0,98 1,17 
8 Ml Bel ,07 le ihe 112 Ts 1028 100 tr 
9 1,06 1,02 109 128 1,22 1,05 1,06 1,00 1,17 
10 EE RTS Git Gprity mec pan prank Cones Cant hl 
1 110. 1,10 1,05 123 1,14 1,09 0,98 1,04 1,11 
12 Ol PSA) Galen 116 G7 71,02 40,98, 0,93" 1,60 
13 103 1,10 114 111 114 1,03 095 0,83 0,86 
14 105 1,10 114 1,09 117 098 095 0,76 0,77 
15 108 113 114 1,09 1,14 0,98 0,95 0,75 1,00 
16 109 115 113 1,06 1,11 0,98 0,96 0,97 
17 Ml F115 291,15 (106 Loe 4104. 91,00 9 “iis 
18 Ris 5 S216 ec Lie" 4208 Si Tien icore a1.31 
19 Bie) ad ls Wetter) © 113 clleneel iseee1 20 
20 “AU NTS eS CR ge 
21 200«214,16=—«*2,26 2526s? dA 8 
22 $19 g 217/18 L25) 086 Gl 27. 91S sell Gis 
23 120 116 1,24 125 41,24 1,10 1,09 1,09 
24 Beit 43.16) 2.28.6 04,96 | 4122) e108” 107, 41,07 
25 119 117 121° 1,22 1,19 1,08 1,05 1,05 
26 MiSs SELIG) 55119) 123 1.21 © (1,08) 410705102 
27 BY bese it wele2 Lit. 106 072 1.05 
28 mos 21l 118. “221 17 106. 1,07 S02 
29 06 10) g LIF tte «11S. | 1105" 05) V2 
30 107 12109 «21414 «4117 «1,14 1,05 1,05 1,00 
108 1,08 1,14 115 41,13 1,03 1,04 1,01 
33 108 110 4114 42115 1,11 1,04 1,02 0,98 
33 09 «6 09ssd2Ssid3s«dzSsid203—s1,02 0,98 
34 Pig mw atlii wL13) his i1 51.03P (102-0 91,00 
35 Pig meds: eels. “Gli s40 gid alee 31.05 
36 “15 6 °815 137 213 (1,10 (1,04 1,00 
31 Vig 118 118 4113 1,10 1,04 1,00 
38 $15 e119 122 “114 Tilo fide © <1,00 
39 Vig 122 «124 «16 sSs«1,05 1,00 
40 Cy 1 i ee ae 
22 #1,95 41,27 1,19 4114 1,08 1,0 
#2 124-127. «130«C21ssTSS 109,03 
43 6 429. 13k 1,22. ai,lT G1,10at,06 
44 2g «13132 s«24 = -,20 1,13 1,06 
45 20.0 w13%i51134 125 41,22. gis its 
46 129 134 132 «(127 «124 116 1,09 
47 i ites ease 9, 1.26 2 1 
48 Tals kb ikl Shab 2a ll 20 eal Go LO 
49 fee Wise TS Beh 129" LIT ete 
50 1 35etn 1 3Guuel SGawial. 22. NUE, 32 wD td 
Ay 1,35 1,33 1,33 1,20 , 
33 138 37 135 1,33 1,31 41,20 1,09 


Tables 1 to 7 after N. Federici, “Statistica”, Vol. X, pp. 274-320 (1950). 


23 
Acta Statistica, Vol. III, Fase. 4 (1952) 
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Age 1750 1780 1800 1820 1840 1860 1880 1900 1920 


53 1.81 21,370 <2j35) 91383" 2 om gt O7 
54 1.26 . 14% 0,35) wll ee elem cee UD 
55 1.26 1,28 “Hse 1,29" “20 20 0 
56 127 124) - 1,90 So ed 0 
57 129. 3,25. 4,25, 1,280. (124 5 Mi 
58 199 ‘14 “U21 “zs "12 *ias 
59 1.96 © 7195s To ieee Seni AS 
60 192, S20. Blo 95 — 9192. is 
61 (1s —Vik. SUt0n ooze © oles eet 
62 Lis) hice SLi “ier Sis PL 
63 1164 Gide Its oie 0s Wale 
64 130, Vid S04 or 0 erie, | ie 
65 1.20 UR 109 Po ea 
66 Lite, Maa, Ti. Ai ple eet fo 
67 Wis. Opto. (2. “his 8s il 
68 Tigh? Vege” Fry" FES SEES” As 
69 Li, (f= is 212, abe 
70 110. 253 a7. Fits Sige. ~~ Sis 
71 109 “RI? 015 this 207  ct.4 
72 {Tome TL? "14 Sale Naa 
73 112 woli0) 153 “hiss it The 
74 Pili wil) sii? “ie ids, ae 
75 Dio “bas ripe Sh fee eri 
16 1.09, 411. “Vid 709 iGo 

17 1.08 (1,12 8.15 “10s. “806 

78 PATS) "1.100* Sts. | ilo ee 64 

79 Tip SIO PE. STIt seas 

80 109 “T4s ls “tao “Sate 

81 £05 *tt TE A2° Shee? “ae 

82 P03:,6 c106.m8 10d) eto) las 

83 162, 107 “Tis “Ta. jas 

84 ogy’ "07 9 “29 ato" “a 

85 1.01 109) . 1130189. 1,63 

86 102 f10 9Td4 Gos "04 

87 100° “as Fre” Ae FS 

88 tin 80>  8hto. Toe a3 

89 112 “Tis 35 4118. tas 

90 12 as Mae ase Fal 

91 Ce oe ae ee EN 

92 ili 14 “tr ari 099 

93 Lis eyes gitar 708 

94 VG games he Fm ey Cg ke 

95 Lik” 4s. “liners. ens 

96 1,12) Weta SBS Be 

97 tld» Wiia S718 Ras 

98 113 “Ys She” a 

99 Lele etd i6 HL Aswiiaia 

100 E12. S245. “aa picts 


The distribution of excess male mortality over the ages of 
life in the various countries of Europe differs considerably as regards 
extent and degree, and furthermore, the phenomenon has changed 
differently in different countries with time. 


From the record we have for Sweden, which goes back to 1750, it 
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Table 2. Norway. 


Age 1840 1860 1880 1900 1920 Age 1840 1860 1880 
0 Eee (18 «1.20 1.31 46 1,09 1,07 1,14 
1 1,03 1,03 1,10 0,92 409 Melle 406. (1,17 
2 1,04 1,02 1,20 1,00 48 1,12 1,08 1,21 
3 1,00 0,88 1,17 1,33 a0 wile elon | 1004 
4 1,00 0,92 1,00 1,00 50. Ew 1,13) 1,29 
5 1,05 1,05 1,04 1,00 SL) “E206 15 

6 0,97 0,99 1,10 1,00 S20 1,22 “116 

7 1,00 0,99 0,98 1,08 1,00 S35 1,23." 10 

8 1,00 0,99 1,06 1,06 1,00 545 125, 120 

9 1,03 0,98 0,98 1,00 1,00 B50) 2b 122 

10 1,00 0,98 0,98 0,97 1,00 56.) 12% *1,28 

11 1,00 0,98 0,95 0,94 Sha 1,280 1335 

12 1,00 0,95 0,93 0,84 58. 4,29. 1:26 

13. 1,02 0,93 0,89 0,78 59" 1525, 1428 

14 1,00 0,91 0,88 0,83 60) 1,24, 1,27 

15 1,00 0,96 1,00 0,87 61" B2ap 14 

16 1,00 1,06 1,15 1,00 ee 2p 1,24 

ioe ee RE 133. 1,20 639 “s10) eh 

3 «1G «6135 «621,46 ~=—1,35 64 1,16) 1,21 

19 1,34 1,46 1,53 1,54 65 1,15 1,18 

20 147 «#4152 1,57 1,55 66 1% “1,17 

21 158 ($4156 1,62 1,57 67) 1,201.20 

22 1,58 41,57 1,63 1,57 68 1,19 1,23 

23 151 «(21453)«C1,59.Ssi1,51 69 1,16 1,26 

24 143 41,45 1,55 1,45 70) “14 129 

25 1,35 1,40 1,49 1,42 Ty) cle 

26 1,26 1,33 1,43 1,41 724 | V,15 

97 ¥i6 «4125 «41,36 @61,38 q3~ ~ 1,19 

awaits Ete 147. 1,31 74> * 1:16 

27) Se 6455, «(128 | (1,21 75 1,14 

BY ee LY Se OS be 76 «©6113 

31 1,03 1,06 1,16 ait em eb 

20° 10s 40a 1,13 qe ihi2 

33 «1,00 «1,00 ~—-1,09 719, 1:32 

34 1,00 0,99 1,06 $0. «1,15 

= t01 0.99 1,03 81 1,14 

36 ~=«-:1,00 0,991, 03 S25 1S 

37 ~©+1,00 0,99 1,01 ose 810 

38 =s-1,01 «0,99 s:1,01 84 1,09 

39 1,02 1,00 1,03 85 1,09 

40 1,01 1,01 1,03 26. 07 

41 1,02 1,01 1,04 Sim 61,05 

42 1,03 1,04 1,06 88 ~=—-1,04 

43 1,02 1,03 1,07 89 1,08 

44 1,03 1,05 1,10 90 ~—-1,10 

45 1,06 1,06 1,11 


is clear that the excess male mortality occurred, at least as far back 
as 1750. There is further in all countries a general rise in excess male 
mortality as time proceeds, although there are exemptions to this 
trend. One of these exemptions is Sweden where the excess male 
mortality has declined in the last 100 years or so. 
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Table 3. England. 


Age 1840 1860 1880 1900 1920 Age 1840 1860 1880 
0 ie 126 pel 2A ees 51 27, («1,24 «1529 
1 12060 1.045 05008 Sele 52 1,26 1,24 1,32 
2 1,00 1,00 0,95 1,06 1,06 53 1,26 1,24 1,34 
3 0,99 1,00 1,00 1,00 1,00 54 Page 23 sed 
4 1200 IE LO eel 02 0-94 IELS 55 Lat 323 1,87 
5 1,06 1,04 1,00 0,96 1,20 56 125) 152536 
6 W059 1,045 91500 5 302) Eis DT 1523) 1,27 1,36 
tt 0:9908 1.07) el le SOO eA 58 (23° Esl 1536 
8 0:96" 1035 71:06" 00) 11 59 1,26 1,31 

9 0390. 1502" 1,045 1,00 bis 60 1530 «©=—s,30 

10 0,90 1,00 1,00 1,00 1,08 61 131) “1528 

We 0,89 0,92 1,00 1,00 1,00 62 1,30 1,29 

12 0,83 0,94 1,00 1,00 1,00 63 ef aut 3 | 

133 0,84 0,97 1,00 0,95 1,00 64 1,26 1,33 

14 0,80 0,97 0,96 1,00 0,94 65 2s 233 

15 0,88 0,98 0,93 1,00 0,95 66 26" 130 

16 0,89 0,95 1,00 1,00 1,00 67 1,28 1,27 

17 0,92 0,96 1,03 1,04 1,04 68 130 41,23 

18 0,96 0,96 1,06 1,03 1,08 69 1,28 1,30 

19 0,99 0,98 1,05 1,03 70 1324 635 

20 C1 99S 08 el OG 71 Tai 136 

21 1,01 0,98 1,10 1,06 72 MLS — E3s 

22 02a O0 Reel Oeeel 0G 73 LT «30 

23 1,04 1,02 1,10 1,06 74 EAS E27 

24 OS OZ eee 1206 15 1,14 1,26 

25 03a OSs 12 09 76 1,18 1,25 

26 105 LOA Tie 09 fi 1,20 1,26 

27 T03eeel COn mel Ll eet 2 78 1,20 1,30 

28 1,04 1,06 1,13 1,09 79 1,18 

29 OSM Ic OGe ello 09 80 it 

30 DOA OS eae toa 09 81 tba iy 

30 1304055 1055 el L509 81 1,17 

31 LOA ts Oeste mel et 82 1,18 

32 LOS 10S elo ete 83 1,17 

33 15 06)eee 10 9R ed 2 ee ey 84 1,16 

34 LOSE LO alee 85 Kis 

35 LL ST Ti Nb 86 TUS 

36 Lille ses eel 2A eee 2 87 1,12 

37 LSS) Delo antes 88 it 

38 Loe lbs Sal. 2o eaters 89 1,12 

39 Uli, auletlin: Pilkees 90 V1 

40 Le Le Learn 91 1,11 

41 TIRVMS o Asie A 92 Lo 

42 LO 2129 93 id 

43 WADE nea ess 94 1,10 

44 I20 eel 2oe m3 0 95 1,09 

45 1,22 1,24 1,30 96 1,08 

46 iets © at ay aS P- 97 1,06 

47 i23en 1258 91,32 98 1,06 

48 1520 selena 99 1,05 

49 e265 2a lesh 100 1,04 


50 125 12659 1,30 


Looking at the tables, one finds that there are considerable 
differences from country to country as regards the distribution of 
excess male mortality over the ages of life. Most countries show an 
excess female mortality for certain age groups. Thus in Norway, 
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Table 4. France. 


1820 1840 1860 1880 1900 


DS leg 1,00 1,19 1,21 
1,03 1,23 1,10 1,04 0,95 
1,02 1,10 1,06 1,07 1,03 
1,01 1,03 0,98 1,02 1,10 
0,98 1,02 1,01 1,11 0,97 
1,02 0,97 0,93 
0,98 1,00 0,93 0,95 0,91 
1,02 0,91 0,92 0,94 0,92 
1,03 0,95 0,89 0,92 0,91 
0,98 0,85 0,79 0,93 0,93 
0,86 0,80 0,81 0,91 0,92 
0,80 0,78 0,79 0,88 0,96 
0,77 0,77 0,73 0,85 0,88 
0,76 0,79 0,76 0,87 0,85 
0,77 0,81 0,82 0,86 0,75 
0,80 0,85 0,79 0,87 0,76 
0,82 0,87 0,79 0,88 0,90 


0,84 0,88 0,85 0,92 0,93 
0,88 0,92 0,91 0,99 1,04 
0,92 0,98 1,03 1,05 1,12 
0.99 102 41,14 1,14 1,19 
Lie be ie 211 1,19 
1,23 1,28 1,41 #4221419 1,17 
14 C«iaC (tsi TOC C81,15 
120 1,26 1,32 «1,14 ~~ 14,09 
110 115 1,22 41,10 .1,07 
0,99 1,00 1,13 1,07. 1,07 
0.94 0,91 1,07 1,06 1,09 
0,90 0,90 1,01 1,07 1,14 
0,87 0,87 0,98 1,09 1,128 
0.86 0,89 0,99 1,13 1,25 
0,87 0,91 1,00 1,15 1,30 
0,88 0,94 1,01 1,18 1,35 
0,90 0,96 1,05 1,20 1,39 
0,91 0,97 1,08 1,22 ~~ 1,43 
0,93 0,99 1.10 1,25 1,48 
0,93 1,01 1,13 1,26 1,52 
0,95 1,03 1,17 1,28 
0,96 1,07 1,19 1,29 
0,98 1,09 1,23 1,30 
Lor 1,12 025 1,32 
103 114 1,29 1,34 
107 114 11,31 1,37 
109 1,17 1,34 1,39 
110 1,21 41,37 1,43 
111 1,23 1,39 1,46 
112 1,24 1,41 ~~ 1,47 
112 1,24 1,42 1,48 
111 1,24 1,43 1,50 
bit. 125 142 a2 


_ 
Oo 
bo 
i) 


aN ol oll ool oll lo 
Scooooown 


oO 


noo 


’ 


oo 


. 


oucow 


1820 


1,12 
1,14 
1,16 
1,18 
1,20 
1,19 
1,19 
1,19 
i 
a7 
1,17 
1s 
1,18 
1,17 
1,16 
1,16 
1,15 
1,15 
sha 
1,16 
1,14 
1,14 
1,16 
1,16 
1,14 
1,15 
1,18 
1,17 
1,15 
Vis 
1,15 
1,15 
1,13 
1,13 
Ll 
111 
1,13 
1,13 
1,12 
ll 
1,10 
1,09 
1,07 
1,07 
1,06 
1,06 
1,06 
1,05 
1,04 
1,05 


1840 


1,26 
1,26 
1,27 
1,27 
1,31 
1,30 
1,31 
1,30 
1,30 
1,30 
1,30 
1,31 
1,31 
1,30 
1,29 
1,27 
1,26 
1,26 
1,27 
1,26 
1,24 
1,24 
1,25 
1,26 
1,25 
1,24 
1,25 
1,26 
1,22 
1,21 
1,20 
1,20 
1,18 
17 
1,15 
1,13 
13 
1,10 
1,13 
111 
1,10 
1,09 
1.07 
1,07 
1,06 
1,06 
1,06 
1,05 
1,04 
1,05 
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1860 


1,40 
1,40 
1,37 
1,40 
1,43 
1,43 
1,42 
1,44 
1,44 
1,45 
1,44 
1,45 
1,45 
1,45 
1,44 
1,41 
1,41 
1,40 
1,43 
1,39 
1,36 
1,35 
1,36 
1,36 
1,38 
1,36 
1,35 
1,35 
1,33 
1,30 
1,27 
1,25 
1,95 
1,22 
1,19 
1,16 
1,14 
111 
1,13 
111 
1,10 
1,09 
1,07 
1,07 
1,06 
1,06 
1,06 
1,05 
1,04 
1,05 


Denmark, we find a persistent excess female mortality for the age 
groups between 10 and 15, and 30 and 35 approximately. In the 
Netherlands the second age group is very much extended and reaches 
from approximately 26 to 43. There is also a third age group for 


1880 


1,56 
1,59 
1,61 
1,62 
1,64 
1,61 
1,59 
1,60 
1,59 
1,63 
1,63 
1,62 
1,63 
1,62 
1,61 
1,61 
1,62 
1,59 
1,58 
1,57 
1,52 
1,51 
1,50 
1,49 
1,49 
1,52 
1,49 
1,47 
1,45 
1,41 
1,37 
1,34 
1,32 
1,29 
1,25 
1,21 
1,19 
1,16 
1,13 
iii 
1,10 
1,09 
1,07 
1,07 
1,06 
1,06 
1,06 
1,05 
1,04 
1,05 
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Age 


CSDADANEWNKE SO 


1840 1860 

0,97 
0,89 0,92 
0,85 0,83 
0,80 0,80 
0,76 0,75 
0,76 0,73 
0,78 0,75 
0,82 0,80 
0,87 0,87 
0,97 0,94 
1,06 1,02 
1,17 +1,09 
WI iio 
(ie iOS 
ThOg eos 
1,00 1,00 
0,93 0,94 
0,91 0,90 
0,88 0,86 
0,86 0,84 
0,86 0,86 
0,86 0,86 
0,85 0,87 
0,86 0,90 
0,87 0,93 
0,89 0,96 
0,91 0,99 
0,93 1,01 
0,95 1,04 
0,97 1,07 
0,99 1,10 
OL) Leia 
1202) “Was 
1,06 ~ 1,16 
ThOO emlel ey 
1,13 1,18 
lie ee0 
Ly: 
25.) 1222 
1,30 1,24 
1,32 1,26 
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Table 5. Denmark. 


1900 


1,21 
1,05 
0,98 
1,00 
1,00 
0,98 
1,00 
0,90 
0,96 
1,00 
0,89 
0,89 
0,89 
0,89 
0,85 
0,83 
0,88 
0,93 
1,00 
1,03 
1,03 
1,06 
1,03 
1,03 
0,94 
0,91 
0,88 
0,82 
0,82 
0,82 
0,80 
0,80 
0,83 
0,83 
0,86 
0,86 


1920 


1,07 
ts 
1,20 
1,00 
1,00 
1,05 
1,13 
We, 
1,18 
1,22 
p23 
1,29 
1,13 
1,25 
1,30 
1,25 


1840 


1,37 
1,41 
1,45 
1,46 
1,44 
1,42 
1,43 
1,41 
1,39 
1,38 
330 
1,30 
1,31 
1,31 
1,31 
1,29 
1,22 
uP alin 
1,12 
1.15 
1,15 
1,12 
i P| 
1,08 
1,07 
1,06 
1,05 
1,07 
1,05 
1,02 
1,00 
1,00 
1,04 
1,06 
1,07 
1,10 
1,06 
1,04 
1,05 
1,08 
1,09 
TSLO 
Use 
1,10 
1,11 
1,10 
el 
1,10 
1,11 
Legg 


1860 


1,27 
1,28 
1,28 
1,27 
1,27 
1,27 
1,26 
1,26 
1,22 
1,22 
1,22 
1,20 
117 
1,14 
1,16 
1,19 
1,12 
1,09 
1,05 
1,03 
1,03 
1,00 
1,01 
1,01 
1,00 


1880 


1,04 
1,05 
1,06 
1,06 
1,05 


Age 


WODADAUBRWNH OS 


1,21 
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1,22 
1,01 
0,99 
1,23 
1,08 
1,02 
0,98 
1,00 
0,99 
1,02 
1,02 
0,96 
0,95 
0.89 
0,83 
0,79 
0,86 
1,00 
1,19 
1,23 
1,27 
1,28 
1,25 
1,18 
1,13 
1,07 
1,01 
0,96 
0,92 
0,90 
0,86 
0,85 
0,85 
0,85 
0,85 
0,86 
0,87 
0,87 
0,89 
0,89 
0,92 
0,94 
0,97 
0,99 
1,01 
1,04 
1,08 
1,11 
1,14 
1,16 
Lit 


1880 


1,21 
1,07 
(tei 
1,19 
120 
1,09 


Table 6. Netherlands. 
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1920 


1.31 
1,11 
1,14 
1,25 
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1,00 
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1,13 
1,15 
i bye 
1,20 
1,22 
1,22 
1,10 
1,10 
1,10 


1840 


1,24 
1,26 
1,27 
1,28 
1,29 
1,26 
1,26 
1,26 
1,25 
1,21 
1,19 
1,20 
1,22 
1,22 
1,18 
1,15 
1,13 
1,11 
1,15 
1,15 
1,13 
Lit 
1,09 
1,09 
1,08 
1,07 
1,10 
1,10 
1,07 
1,06 
1,06 
1,05 
1,06 
1,07 
1,09 
1,09 
1,07 
1,05 
1,06 
1,07 
1,08 
1,09 
1,08 
1,07 
1,08 
1,09 
1,08 
1,08 
1,09 
1,08 


1860 


1,18 
1,19 
1,17 
1,18 
1,17 
1,18 
1,18 
1,15 
1,12 
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1,14 
1,15 
1,12 
1,09 
1,14 
1, 
1,08 
1,07 
1,05 
1,04 
1,07 
1,06 
1,06 
1,04 
1,06 
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Germany 
1880 1900 
1,00 0,99 
1,06 0,95 
0,90 1,03 
0,95 0,98 
0,97 0,95 
0,91 0,89 
0,84 0,88 
0,77 0,82 
0,73 0,78 
0,70 0,74 
0,76 0,79 
0,83 0,87 
0,90 0,98 
1,00 1,06 
TO 1,13 
ital 1,50 
i100 1,43 
Hil 1,40 
1,09 1,35 
1,07 1,32 
1,06 lez 
1,06 1,26 
1,04 120 
1,01 1,18 
1,00 1,20 
0,99 1,20 
0,97 LOA 
0,97 1,25 
0,97 1,29 
0,99 1S 
1,00 1.37 
1,03 
1,04 
1,06 
1,09 
ele 
el? 

1,20- 
125 
130 
oe 
Leo 
1,41 
1,46 
ila 
1,54 
1,63 
ae? 


1,80 


1920 


1,20 
1,07 
1,07 
1,00 
1,00 
1,02 
1,07 
1,05 
1,06 
1,07 
1,00 
0,90 
0,80 
0,72 
0,68 
0,81 


Table 7. 
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1,19 
1,05 
1,03 
1,03 
1,00 
0,93 


Finland 
1900 


1,22 
0,61 
1,02 
0,95 
0,96 
0,91 
0,91 
0,89 
0,90 
0,96 
0,95 
0,90 
0,90 
0,95 
1,05 
1,13 
1,16 
1,22 
1,32 
1,33 
1,32 
1,31 
1,30 
1,26 
1,26 
1,20 
1,17 
1,08 
1,03 
1,00 
1,00 
0,97 
1,03 
1,12 
1,23 


1920 


1,21 
1,05 
1,10 
1,13 
1,06 
1,00 
1,00 
1,05 
1,06 
1,07 
1,08 
1,09 
1,20 
1,30 
1,18 
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1920 


Hie Mi 
1,02 
1,05 
1,09 
1,00 
1,02 
0,97 
1,00 
1,00 
1,04 
1,05 
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Switzerland 
1880 1900 
1,18 1,20 
1,00 1,04 
0,99 1,02 
1,03 1.04 
1,02 1,02 
1,00 1,00 
1,00 0,97 
1,02 0,94 
0,98 0,96 
0,95 0,96 
0,91 0,96 
0,90 0,91 
0,84 0,83 
0,81 0,80 
0,75 0,79 
0,74 0,84 
0,81 0,91 
0,86 0,94 
0,92 0,92 
0,94 0,95 
0,96 0,95 
0,96 0,95 
0,96 0,98 
0,97 1,00 
0,98 1,00 
0,98 1,03 
1,00 1,05 
1,00 1,05 
1,00 1,08 
1,02 Hae} 
1,03 1,16 
1,03 23 
1,07 1,29 
1,08 
1,10 
Vis 
1,18 
alc 
pail 
1,24 
Al 
131 
1.33 
1,37 
1,41 
1,44 
1,46 
1,48 
1,49 
15 
1,58 
1,59 
1,59 
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1920 


1,93 
1,15 
1,00 
1,15 
1,06 
1,00 
0,96 
1,05 
1,06 
1,00 
1,00 


1,00 
0,92 
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female excess mortality in the Netherlands between 45 and 55, at 
least in the year 1880. 

In England, we used to have a pronounced female excess 
mortality between 10 and 20 which, however in this century has 
contracted to the age group 14-15. In France we find a similar 
contraction of excess female mortality from 10-20 to between 11 
and 16. In France, however, there used to be a large excess female 
mortality between 36 and 39 whick, with time, has disappeared. The 
same contraction phenomenon of the two age groups for excess female 
mortality mentioned above is also found in Finland, Germany and 
Italy ; Switzerland shows only the first of these groups and that con- 
tracts very much in this century. 

The explanation of this phenomenon seems fairly obvious. 
The first period represents that of adolescence and puberty and it 
is well-known that the crisis of puberty in girls is accompanied by 
deeper organic changes than in boys. The second period for which 
there is, in general, an excess female mortality is that in which 
the female sex is exposed to the hazards of child-bearing. It is quite 
in accordance with this interpretation that, as these hazards become 
less and less, through advances in medicine, that the age group with 
less and less, through advances in medicine, the age group with 
excess female mortality should contract more and more as time goes 
on, and that in some countries with a high standard of living and 
of hygiene it should give place by and by to excess male mortality for 
these very ages. This has happened, for instance in England, Sweden, 
Norway and Switzerland. 

tii: 

The most obvious explanation of excess male mortality is an 
increased risk for males in the environmental conditions and an 
increased occupational risk. For example, excess mortality in males 
due to violence is, no doubt, very often due to the fact that males are 
exposed, or do expose themselves, to a greater extent, to risks of that 
description. Since, however, during the present century conditions 
of work have improved considerably as regards safety measures 
and hygiene, and hours of work have been reduced, one would 
expept a decrease of excess male mortality on that account, whereas 
the opposite is the case: excess male mortality is on the increase in our 
time. Another fact which may be adduced against this type of 
explanation is that industrial occupation of females is on the increase, 


and that therefore they are more exposed to occupational risk than 
before. 
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Finally there is a pronounced excess male mortality in the 
youngest age groups and also in the highest age groups for which 
the explanation of occupation or environmental risk is obviously 
not suitable. 

The hypothesis of the increased occupational and environmental 
risk run by males as an explanation for excess male mortality can 
therefore have only a limited validity. It may explain certain 
special cases of the phenomenon and it may have application to 
those ages which expose themselves to risks of dying by acts of 
violence, and to occupations where predominantly male workers are 
exposed to the pathogenic influence of mineral or metal dust, gas, 
fumes, etc., but it cannot be regarded as a sufficient explanation of 
the phenomenon in general. 


{Ve 


The phenomenon in question attracted first attention in the 
form of excess male infant mortality. It was investigated by F. Lenz}, 
for Germany as a whole, Bavaria, France, Spain, Italy, Austria, 
Hungary, England, Sweden and Norway. The hypothesis put forward 
by him was that certain genetic differences between the sexes may 
be regarded as responsible for excess male mortality. 

This hypothesis is based upon the fact that the genetic structure 
of males, in the human species, is principally different from that of 
females. 

In man, the male is the heterogametic sex, and as such possesses 
one x-chromosome, whereas the female, as the homogametic sex, 
possesses two. Thus, the female has two parallel sets of genes whereas 
the male has, strictly speaking, no such parallel set, because the 
differential segments of his y-chromosome do not exactly correspond 
to those of his x-chromosome. 

It follows that if a recessive gene for a certain disease or condition 
is carried in a differential segment of the x-chromosome, it is at once 
uncovered in the case of the heterogametic individual, and if in its 
action such a gene is disadvantageous, deleterious or lethal, it will 
find expression in the phenotype of the individual. If, on the other 
hand, the individual is homogametic there is always a chance that the 
same differential segment which carries the recessive gene in one 
chromosome may carry in the other chromosome a compensating 


1 Die Ubersterblichkeit der Knaben im Lichte der Erblichkeitslehre. Archiv 
fir Hygiene, XCIII. pp. 126-150. 
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gene and the expression of the recessive gene in the phenotype would 
be prevented. 

Lenz found corroboration for his hypothesis in the fact that in- 
crease of infant excess male mortality with time was, in general, 
accompanied by a decrease of the death rate of infants, males and 
females together. He argues as follows: if the lack of resistance to 
certain diseases were due to certain recessive genes, and if these 
recessive characteristics were sex-linked or at least sex-limited, 
then these diseases would find expression more often as general health 
conditions improved. As a consequence, the downward trend in the 
general mortality rate of infants should be accompanied by an 
upward trend of excess male mortality. 

This he found confirmed in all the countries for which he had 
collected data. His method was to compare the ratio of male over 
female infant death rate with the general infant death rate and 
calculate the correlation co-efficient for the two series. He obtained, 
invariably, a significant negative correlation which he regarded as 
support for his hypothesis that excess male mortality was due to 
innate differences between the sexes. To make his argument clear, 
let us suppose that in a given population males had a death rate of 
1% and females of 0.8 %,so that out of 100000 males we would expect 
1000 to die per year, and out of 100 000 females 800, the excess male 
death rate would then be 1.25. If now general health conditions 
deteriorate, say, due to an epidemic and the death rate of each sex 
increases by, say, 5%, the male death rate would go up to 1.5% 
and the female death rate to 1.3%, and the excess male death 
rate would now be only 1.1%. 

Conversely, an improvement in health conditions affecting 
both sexes in the same way would result in an increase in the excess 
male mortality. 

The certainly remarkable phenomenon of the negative correlation 
between the two series, is not confined to infant mortality but can be 
extended to mortality for all ages taken together, and thus to the 
general death rate. 

If we wish to compare the overall death rate for persons (male 
and female together) with the excess male mortality, for a series of 
years, we must apply two corrections: one for the difference in age 
composition of the population from year to year, and another to 
correct for difference in age composition between males and females 
in a given year. We therefore use in place of the crude death rates 


Herdan, Causes of Excess Male Mortality in Man 365 


for male and female together, the comparative mortality index (the 
year 1938 is taken as the base year), and in place of the crude ratio 
of male to female mortality, the adjusted ratio (table 8). 


The two series of data are for quinquennial intervals from 
1941-1945 (table 3 of the Registrar General’s Statistical Review of 
England and Wales for the year 1949, tables Part I Medical, H. M. 
Stationery Office, 1951). 


Table 8. 
r Comparative Mortality i ju i 
ves fans sane peapehecy Metres se. Padmore “ 
1841-1845 2,179 1,096 
1846-1850 2,360 1,088 
1851-1855 2,276 1,099 
1856-1860 2,177 1,095 
1861-1865 2,253 1,116 
1866-1870 2,242 1,132 
1871-1875 2,220 1,150 
1876-1880 2,130 1,160 
1881-1885 2,018 1,152 
1886-1890 1,997 1,164 
1891-1895 1,993 1,161 
1896-1900 1,885 1,178 
1901-1905 1,716 W191 
1906-1910 1,572 1,198 
1911-1915 1,495 1,226 
1910-1920 1,450 1,282 
1921-1925 1,220 1,240 
1926-1930 1,167 1,268 
1931-1935 1,104 1,276 
1936-1940 1,070 1,337 
1941-1945 0,956 1,414 


The correlation co-efficient results as 0.97, which is highly 
significant and in full agreement with the theory put forward by 
Lenz. 

However, in regarding highly significant negative correlation 
co-efficients between infant or overall mortality and excess male 
mortality as a support for the genetical hypothesis, it should be borne 
in mind that we are dealing with the correlation of two time series, 
and that although two events may appear correlated in time they may 
have nothing to do with one another. It is therefore only in conjunc- 
tion with the reasonableness of the genetical hypothesis that we can 
regard the negative correlation co-efficient between the two series 


as a support for the hypothesis. 
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The hypothesis has been the subject of a thorough investigation 
by M. Greenwood and E. M. Newbold. 

For the correct evaluation of the hypothesis it is necessary to 
consider the results reached by Greenwood and Newbold. They 
examined infant mortality in England and Wales taking into con- 
sideration both the static and the dynamic variation, that is, the 
variation according to place and to time. For all causes together 
they found Lenz’s criterion, that is the negative correlation between 
the mortality of infants and excess mortality of infant males, 
confirmed for both dynamic and static variation. For special causes 
of death they found it confirmed for diarrhoea, and, on the whole, 
also for tuberculosis, congenital debility and respiratory diseases. 
For diphtheria, croup and measles they found a negative correlation 
for certain series, but not for all. For whooping cough, for which 
female mortality is notoriously in excess, they found the correlation 
positive for all groups. They did not regard their results as conclusive 
evidence for the correctness of Lenz’s theory. 

They also investigated the theoretical side of correlating the 
two series and it is of interest to consider their analysis. What Lenz 
correlated was the ratio of male to female death rate and the death 
rate of infants, male and female together. 

If x is the rate of male mortality that is in this case the ratio of 
deaths under one year of age to births in the year and y that of 


A By 
female mortality, what Lenz is correlating is x/y and siti meee 
A+B 
gt ’ A B 
But in his series and are approximately constant, 


A+B A+B 
and neglecting the difference in the sex ratio at birth, each may be 
taken as equal to 0.5, so that approximately we have the correla- 


. x . . 
tion of — with x + y. To a first approximation this is equal to 
i Ade 


2 oa 


Ox fee 1 1 
eter) | 

IE 9 o, 2h yxy pets nae | ) 
y a \ o, + a 27, 0x0 


a ioe Kar Greenwood and E. M. Newbold on the Excess Mortality of Males in the 
first year of life. Biometrica, Vol. 17, 1925, p. 327. 


* A, B are the numbers of male and female births, resp. in the year. 
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The last term of the numerator becomes negative if x is greater 


than y, that is, if the average male mortality exceeds the average 
2 2 

. . oy 
female mortality. It follows that if the first term = — 
x Be 
positive and greater in absolute value than the last term, then the 


jo] 


is not 


. . x 
correlation co-efficient between — and x + y must be negative, 


2 2 
therefore, if 2 >) = andifx ) y then the correlation co-efficient will 
yx 


be negative. 


It appears now from Greenwood’s paper that the two conditions 
under which the correlation co-efficient becomes negative are again 
correlated in the case of simple sampling, that is if the variation 
within the series of death rates can be regarded as following the 
Bernoullian Law. 


In that case, the standard deviation of the percentage, el is 
n 


related to the mean value of the p’s, and the absolute variation 
increases with the value of the percentage. The increase, however, is 
such that it increases more slowly than the mean value, with the 
consequence that as the percentage increases, the relative variation, 
that is the standard deviation divided by the mean, decreases. 
We may therefore say that for the case of simple sampling an excess 


x *. . . 
of — is accompanied by an excess of the relative variation of y over 


es 
that of x. 

It appears now that Greenwood regarded these algebraic con- 
sequences of a difference between x and y and of the corresponding 
relative variations as something which detracted from the value of 
the hypothesis as an explanation of the phenomenon in question. It 
goes without saying that such an interpretation is not justified. The 
negative correlation co-efficient, provided it is significant, is not less 
real because we understand its mathematical struciure and because 
it appears as an algebraical necessity in the case of simple sampling. 


In addition to this, Greenwood himself has shown that the 


condition of simple sampling hardly ever applies to these experimental 
series. It follows, therefore, that the negative correlations as far 
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as they result from his series cannot be regarded as algebraical 
consequences of the mean values of male and female mortality. He 
found that even in the case of non-simple sampling a high absolute 
variance and a small relative variance did result for that sex which 
has the higher average mortality. Instead of regarding this as a 
confirmation of the reality of the negative correlation co-efficient, he 
considered it, in a way, as an argument against Lenz’s theory. 

It is only for the cases where the mortality of either sex is very 
small that, although the average male mortality may be higher than 
the average female mortality, and the absolute variation of the former 
also higher than that of the latter, the coefficient of variation, 
or the relative variability, does not follow the usual pattern, that 
is, decreases as the absolute variation increases. This accounts for the 
correlation in such cases being sometimes positive. This is the case 
for instance for some series of the epidemic diseases like diphtheria, 
croup and measles. Apart from the fact that for these diseases the 
causes are environmental, and one would not therefore expect, accord- 
ing to Lenz’s theory, an excess male mortality due to recessive charac- 
teristics to express itself, it appears that the variation in these cases 
comes very near to that of the simple sampling scheme, which would 
account for the fact that there is only very little and hardly signifi- 
cant correlation. That in the case of whooping cough, the correlation 
co-efficient results as positive, is again quite in keeping with the 
theory, because the excess mortality here is simply reversed and 
according to the structure of the formula, there must result in this 
case a positive correlation co-efficient. 

Greenwood’s analysis, therefore, need by no means be considered 
as invalidating the genetic theory, nor, to do him justice, does he 
make any statement to that effect. He only considers his results not 
completely satisfactory for upholding the Lenz’s theory without, 
however, attempting to come to a definite conclusion as to inherent 
differences in male and female variation. 


Vi 
We have next to consider the phenomenon of the change of 
excess male mortality with age of life, and the changes which the 
excess male mortality for specific age groups has undergone with 
time. 
Regarding the trend of excess male mortality with age of life, 
there appears to be a systematic difference for excess mortality 
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according to the age of the persons concerned. In the first year of life 
the excess mortality is very high; for the extreme ages 70 and higher 
it is fairly low, it reaches a peak between about 20 and 60. This 
appears to be a more or less general phenomenon at all times and in 
all countries although there are marked differences in this respect. 
For England and Wales, the excess male mortality for specified age 
groups expressed as a percentage of the female death rate is given in 
table 9 (based upon table 5, Registrar General’s Statistical Review of 
England and Wales for the year 1949. Tables Pt. I Medical). 


In order to test the differences in excess male mortality between 
the ages of life one might adopt various methods. In any test it 
would be a question of ascertaining whether the differences can be 
regarded as real or whether they ought to be still considered only 
fluctuations due to chance. 


The method adopted by the Italian writer Nora Federici} is that 
of the calculus of co-graduation. The index of co-graduation can be 
calculated either between territorial series or between the temporal 
series. If there was no selection as regards age of life, that is if for 
successive years of life the excess male mortality remained more or 
less the same, then the co-efficient of graduation should be more or 
less equal to unity all throughout the table of co-graduation. If, on 
the other hand, the index of co-graduation does not maintain itself 
at the level of unity but decreases as the age groups become more 
distant from one another, then this indicates that there is a systematic 
factor which produces the change in the excess male mortality 
according to years of life. It is shown then that this is the fact 
whether we carry out the co-graduation according to territories or 
according to time. 

As regards territories, we find the trend of the excess male 
mortality with years of life specially pronounced in Sweden, England 
and the Netherlands and less so in France. 

Regarding the trend of excess male mortality with time, an 
investigation by Lexis may also be quoted covering the mortality in 
Belgium during the year 1841-1860. The method he used for ascertain- 
ing whether the excess male mortality from year to year for a given 
age group remained more or less the same, or whether it differed 
systematically, was to calculate the standard error of a percentage in 


1 Nora Federici. «La Mortalita Differenziale dei due sessi e le sue possibili 
cause.» Statistica. Vol. X, No. 3. p. 274. 


Acta Statistica, Vol. III, Fasc. 4 (1952) 24 
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two ways: first by the combinatorial formula |/p4 and then by the 
1 


root-mean-square formula. A difference between the two results, 
provided it proves significant, is then taken as an indication that 


the phenomenon has changed with time. The following table shows 
the result. 


Table 10. Male Mortality per 1000 Females. 


Age Mean Range R r Q 
Stillborn 1348 (1281-1410) 23,4 23,6 0,99 
0— 1M. 1359 (1316-1417) 18,5 22,1 0,84 
1- 2M. 1323 (1237-1445) 42,4 37,1 1,15 
2— 3 M. 1253 (1158-1390) 36,2 40,8 0,91 
3- 4M. 1224 (1099-1394) 49,1 42,9 1,14 
5 M. 1284 (1174-1429) 52,7 50,6 1,04 
5- 6M. 1257 (1117-1422) 56,2 52,9 1,06 
6— 9M. 1179 (1109-1257) 34,3 30,4 1,13 
9-12 M. 1085 (1014-1182) 31,1 27,8 1,12 
i 2 Y. 1028 (966-1087) 23,9 15,6 1,53 
2— 3 Y. 990 (926-1065) 23,5 22,1 1,06 
3—.5-Y. 947 (879-1019) 23,7 20,1 1,16 
5-10 Y. 878 (821-945) 28,7 17,4 1,66 
10-15 Y. 713 (620-847) 45,5 18,4 2,95 
15-20 Y. 770 (685-919) 37,9 18,3 2,1 
20-25 Y. 1095 (965-1234) 40,2 23,9 ee 
25-30 Y. 905 (804-1027) 32,8 21,3 1,5 
30-40 Y. 826 (766-909) 29,7 13,9 2,1 
40-45 Y. 943 (812-1115) 50,3 21,6 2.3 
45-50 Y. 1143 (853-1468) 88,9 25,8 3,4 
50-55 Y. 1124 (837-1353) 104,4 24,2 4,3 
55-60 Y. 1055 (850-1305) 93,9 21,8 4,3 
60-65 Y. 962 (848-1140) 64,8 18,5 3,5 
65-70 Y. 913 (789-1151) 71,7 16,6 4,3 
10-75 Y. 906 (766-1150) 65,9 15,9 4,1 
75-80 Y. 903 (811-1019) 36,0 16,8 2,1 
80-85 Y. 866 (781-940) 24.5 19,5 1,26 
85-90 Y. 800 (721-904) 33,9 26,3 1,29 
over 90 Y. 693 (638-831) 28,7 38,1 0,75 
ler 
R — Standard error of percentage calculated by Root-Mean-Square formula: R = = ; 


/ R 
r - Standard error of percentage calculated by combinatorial formula: r = te Q@= ek 


The first column gives the mean excess male mortality per 
thousand, the second column the range which we encounter in the 
twenty years in question, the third column gives the standard error 
of the percentage of male births calculated by the root-mean-square 
formula, the fourth column the standard error calculated by the 
combinatorial formula, and the last column their ratio. 
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The results are very interesting in two respects. We first find 
that up to about the first year of life the ratio Q is sensibly equal to 
unity which is interpreted as being due to the differences of the excess 
male mortality with time for these age groups being not significant, 
but only fluctuations around one and the same mean. From one 
year of life onwards the ratio increases, first slowly, then faster; it 
reaches a peak between 40 and 60, if not 40 and 70, which indicates 
that for these age groups the differences are considerable and that 
there is a systematic trend with time. For the higher age groups 
above 70 we find the ratio Q again decreasing and approaching unity. 

We have, therefore, on the one hand, the result reached before, 
that the excess male mortality has its peak in the higher age groups, 
between 40 and 60 or 40 and 70, and on the other hand, that for 
these age groups we also find a more pronounced trend with time. 
The explanation of this phenomenon can clearly not be sought solely 
in a greater exposure to environmental risk or in genetical differences 
between the sexes. Occupational risk may increase either with age 
of life or with time insofar as certain new occupations may represent 
a new or additional risk: but on the whole the tendency is surely 
one of decreasing occupational risk. Furthermore, it is a matter 
of common experience that the proneness to accidents in industrial 
occupations rather decreases with age because, as the worker becomes 
more experienced, he also learns how to avoid accidents. 

The genetical hypothesis does not seem to be able to explain 
why excess male mortality should reach its peak between 40 and 60, 
nor why there should be a greater variability for these age groups. 
One would be inclined to conclude that recessive characteristics for 
diseases would find expression soon after birth, which is in agreement 
with the high excess male mortality for infants, neonatal mortality 
and still births. 

That recessive characteristics for diseases terminating in death 
should wait for the age of full maturity and after, before expressing 
themselves is not very plausible, and it is, therefore, the third hypo- 
thesis to which we turn in order to explain the phenomenon of 
the differences in excess male mortality with age of life. 

This hypothesis makes the metabolic and physiological differen- 
ces between the sexes responsible for the phenomenon of excess male 
mortality. 

There are certain differences, metabolic and physiological which 
appear early in the life of the embryo and which give rise to endocrin- 
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ological differences. Once these difference are established they take 
charge of the further differentiation between the sexes. The initial 
genetic constitution would seem to determine which of the two 
alternative types of differentiation shall occur, and with the develop- 
ment of the endocrinological system, maleness or femaleness becomes 


finally established. 


These two states are among other things distinguished by definite 
differences in oxidation rate. Experiment has shown that the higher 
metabolic rate of the male renders him less resistant to unfavourable 
conditions and more prone to death. 


To borrow an expression by Claude Bernard about the internal 
environment, we may speak of the differences in that respect between 
the sexes, which must be due to their respective endocrinological make- 
up, as creating either a male or a female internal environment. It 
will be generally admitted that the female is, as a rule, nearer to the 
norm than the male. There is greater variability in males than in 
females due to the fact that we may speak of one predominant 
function of the female between the ages of 20 and 45, which is child- 
bearing and childrearing, whereas there is no such predominant 
function and occupation in the life of the male. He is therefore more 
liable to vary according to his activity. Such variation will lead to 
greater individualisation as time proceeds and will make itself fully 
felt in the higher ages. One might compare this with the spreading 
out of the radii of a circle, the distance between which near the centre 
is small but increases as the length of the radii increases. The increased 
individualisation of the human male according to occupation and 
activity in general will lead him far away from the norm,—if we could 
speak of a norm at all,—so as to make him subject to certain influences 
and decrease his resistance to disease. The further his activity 
removes him from the natural conditions of life the greater will his 
liability to disease on this account become. Insofar, his occupational 
environment must be regarded as a potent factor. 

The full development of what we might call the male internal 
environment will not be reached before his habits become inveterate 
which, as is well known, is one cause of decay. In accordance with 
the facts we may, therefore, think of the years between 40 and 60 as 
representing the peak of the development of the male internal 
environment. After 60, high age with its deprivations in the way of 
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activity, acts as an equalising or levelling force which produces the 
drop in excess male mortality, in accordance with this explanation. 


Conclusions. 


On the basis of the foregoing discussion, the conclusion appears 
to be justified that the three hypotheses which were advanced for the 
explanation of the phenomenon of excess male mortality, viz. the 
hypothesis of increased occupational and environmental risk in the 
male, the genetical hypothesis and the physiological hypothesis 
should not be considered as mutually exclusive. Each of these 
explanations appears to have its proper sphere of application accord- 
ing to age of life and to the particular aspect of excess male mortality. 
The explanation of increased occupational and environmental risk for 
the male applies to those age groups in which the male may be said 
to be exposed to higher risk of dying from violence and from occupa- 
tional diseases. This hypothesis does not provide a general explanation 
of the phenomenon in question but applies, strictly speaking, only 
to certain causes of death, among which violence and occupational 
disease are the most important, and to certain age groups. 

The genetical hypothesis, which ascribes excess male mortality to 
the presence of recessive sex-linked or sex-limited genes in the male 
hereditary make-up, appears to be suitable for the explanation of in- 
fant mortality and also for the change in overall mortality with time. 

The functional hypothesis affords an explanation of the 
phenomenon of the age specific differences in excess male mortality 
at a given time and of the change the excess male mortality for a 
given age group undergoes with time. It considers the higher degree 
of specialisation and individualisation in the male which reaches its 
peak in the higher age groups as being responsible for the peak of 
excess male mortality in these ages, and, furthermore, it considers the 
progressive individualisation of the human race as being responsible 
for the increase of the age specific excess male mortality with time. 
In this respect, however, the two hypotheses, the occupational, and 
the physiological must be regarded as concurrent explanations of 
one and the same phenomenon. 


Résumé. 

L’étude des phénoménes relatifs, autant que l’examen des hypo- 
théses diverses, qui se sont proposées comme explication de la 
mortalité male excessive, nous aménent a la conclusion que les trois 
hypothéses, viz., ’hypothése du risque d’occupation différentielle 
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et de l’environnement, Vhypothése héréditaire et Vhypothése fonc- 
tionnelle, ne doivent pas étre considérées comme mutuellement exclu- 
sives. Chacune de ces explications parait avoir sa sphére d’application 


. aq 9A 
particuliére selon l’Age des personnes et selon Vaspect particulier de 
la mortalité male excessive. 


vA usammenfassung. 


Das Studium des Phanomens der Ubersterblichkeit von Personen 
mannlichen Geschlechts, und der hiefiir beigebrachten Erklarungen, 
fiihrt zu dem Ergebnis, daB die drei Hypothesen, namlich die eines 
erhéhten Risikos des Mannes in seiner Beschaftigung, die Erblich- 
keitshypothese und die funktionale Hypothese, nicht als sich gegen- 
seitig ausschlieBend angesehen werden miissen, und daB jede von 
ihnen ihre eigene Wirkungssphare besitzt, gemals dem Lebensalter 
der Person, und gema8 dem besonderen Aspekt der Ubersterblichkeit. 
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Michael Schwartz: Heredity in Bronchial Asthma. Munksgaard, Copenhagen, 
Denmark, 1952, 288 pp. 

This extensive monograph deals with bronchial asthma and allergy and the 
genetic background of such conditions. The first part of the book, 134 pp., gives a 
comprehensive review of previous experimental, clinical and genetical studies of 
allergy. The author concludes that at present one cannot speak of allergic diseases 
as a well defined group. A number of studies indicate that various allergic conditions 
in man, notably bronchial asthma and hay fever, develop on the basis of genetic 
factors. However, data concerning the mode of genetic transmission and adequate 
risk figures for different relatives of allergic index cases are scanty or lacking. 

For his own study the author has chosen to start with individuals suffering 
from bronchial asthma. The data comprise three sets of propositi, namely 191 
cases with bronchial asthma (group A), 200 control propositi (group B) and 50 
cases with baker’s asthma (group C). The following groups of relatives of these 
propositi were investigated: children, parents, parents’ siblings and grandparents. 
The postexaminations and the examinations of the relatives were to a very great 
extent performed by the author personally, thus securing a uniform body of data 
which should be well adapted to a genetic and statistic analysis. The occurrence of 
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the following diseases among the relatives was noted: bronchial asthma, hay fever, 
vasomotor rhinitis, Besnier’s prurigo, eczema, urticaria, Quincke’s edema, migraine, 
gastrointestinal allergy, epilepsy, ichtyosis and psoriasis, of which, as the muphiex 
points out, the three last-mentioned diseases are of questionable allergic etiology. 
Adequate definitions of the diagnostic criteria used for each condition are given. 


The genetic analysis of the data was performed according to the principles 
of Weinberg’s propositus method. The author finds that the incidence of asthma, 
vasomotor rhinitis and Besnier’s prurigo was significantly higher among the relatives 
of groups A and C as compared with the relatives of group B. He concludes that 
bronchial asthma is an inherited disease and that vasomotor rhinitis and Besnier’s 
prurigo (and possibly hay fever) are genetically related to asthma. An analysis of 
the mode of inheritance in bronchial asthma and related diseases leads to the assump- 
tion that the transmission follows the scheme of monohybrid dominance with 
incomplete penetrance and variable expressivity. 

As the reviewer is not competent to deal with the specific problems concerning 
allergy, the clinical parts of the book have been accepted at their face value. How- 
ever, some of the pertinent statistical procedures used by the author appear question- 
able. 

The author does not explain in detail how he calculated the morbid risks 
(chapter 14). Evidently he did not follow current methods. Take tables 47 and 48 
as an example. The number of onsets of bronchial asthma per survived 5-year 
group are put in relation to the number of individuals observed in each group and 
the incidence per thousand is given. So far it is correct and apparently the author 
has had some kind of a morbidity table in mind. Then he adds all these incidences 
per thousand for all the 5-year groups and obtains, e.g. for siblings of the A- 
propositi, a morbid risk of 236,9 per thousand. Now you can’t add the probabilities 
for each 5-year group like that. The probability to fall ill is, of course, greater if 
more 5-year periods are available than if there would be only one. This cumulative 
probability is, however, not the sum of the probabilities for each period. It in- 
creases according to much more complex rules, For instance the probability to 
get an even number by throwing a dice is 1%. If you make three casts the probabil- 
ity to get at least one even number is not %+%+ %—1.5, which would be 
absurd, but 7/,. The correct way to deal with this calculation is to turn it over and 
calculate the chance of not falling ill for each 5-year period. These probabilities 
can be multiplied and the results will give you the total life time probability of not 
getting the disease. This probability subtracted form 1, finally, gives you the 
morbid risk. By applying the correct method to column | of tables 47 and 48 I 
obtained a morbid risk concerning bronchial asthma for the siblings of the A- 
propositi amounting to 215.5 per thousand instead of 236.9 per thousand. The 
difference as such is not important but this does not justify an apparently incorrect 
procedure. If the 5-year risks had been considerably greater, however, the differences 
would easily have amounted to substantial figures. 

True, the author’s morbid risk differences between the relatives of the 
asthmatic propositi and the relatives of the control propositi appear so great that 
one does not seriously question them, but the handling of the data has been some- 
what irritating to the reviewer. 

The author has no doubt collected a fine body of primary data and these will 
remain a most welcome contribution to our knowledge of bronchial asthma and 
allied disorders. Jan A. Bédk, Uppsala 
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